Remember the STOIC Pod? Well, it had a very short life, because it’s now called STOIC Rack, since the team (Hugues and Jim) would like to use the word “pod” for something totally different: the Kubernetes pod. It is defined as follows:
A pod (as in a pod of whales or pea pod) is a relatively tightly coupled group of containers that are scheduled onto the same host. It models an application-specific “virtual host” in a containerized environment. Pods serve as units of scheduling, deployment, and horizontal scaling/replication, share fate, and share some resources, such as storage volumes and IP addresses.
Since the provisioning of the STOIC Platform is so complex, it is really important that we all agree on a common terminology. In such a context, I’ve been pushing hard for the establishment of a standard deployment model, which we will now refer as a STOIC Rack. So, what is a rack?
First, a rack is something that is deployed in a single availability zone, within a single geographic location (US West Coast, US East Coast, Ireland, Singapore, etc.). As such, a rack is deployed within a single hosting provider, which is either Amazon Web Services or Digital Ocean. The reason why we’re still working with both is that Digital Ocean offers better performance at a lower price when using on-demand instances, and we’re not ready yet to use reserved instances.
When our achitecture becomes a bit more mature and our customer instances are used in production, it is likely that we will move from on-demand instances to reserved instances, and consolidate everything on AWS, but for the time being we will hedge our bets and work with both IaaS providers. Better safe than sorry…
Second, a rack is a unit of deployment where some resources are shared across tenants:
- Single Elasticsearch cluster
- Single copy of read-only datasources like Platform, Ontology, or Pelias
- Single inbound email server (for the email facet)
- Single Docker container for managing upgrades
- Single Docker container for monitoring the rack using cAdvisor
Alongside these shared resources, tenants will have their own Docker containers for Node.js and Redis. Initially, each and every tenant will have a single Docker container running both Node.js and Redis. But down the road we will add the ability to split Node.js and Redis across two separate containers, then deploy them across multiple containers for clustering purposes.
All this will be deployed on Docker containers which will be configured into pods in order to enforce better host affinity across resources. In turn, these pods will be deployed onto multiple virtual machines provisioned on AWS and Digital Ocean using Ansible.
Initially, the provisioning of these virtual machines will be done manually. But as we deploy more and more tenants on more and more virtual machines, we will automate this part of the work as well, even though we have yet to pick a tool for doing that.
The rack deployment model is what we will use for dedicated instances as well, including:
- Elasticsearch server
- Standard datasources (Platform, Ontology, etc.)
- Node.js server
- Redis queue
- Inbound email server
- Upgrade manager
- Monitoring tool
Of course, the configuration of a multi-tenant rack might differ from the one of a single-tenant rack as far as Kubernetes pods are concerned. And when you deploy a local development instance on your laptop, we might have yet another configuration. These packaging details are something that we will have to figure out over the coming weeks.
That’s all for now…
We’ve migrated to Ansible in order to automate the provisioning of our Docker containers.
One of the features of our current Elasticsearch architecture is the ability to share read-only indexes across tenants within a multi-tenant pod. What this means is that datasources like Platform (where all the platform’s meta-data is defined) can be deployed within shared read-only indexes that all tenants have access to.
The main benefit of this architecture is to reduce the amount of storage that needs to be allocated for each and every tenant. By sharing this content across multiple tenants, the marginal amount of storage required by every tenant before they start adding their own content goes down quite dramatically, which helps reduce hosting costs.
While this benefits remain limited for relatively-small datasets like Platform (30K records, 50MB), they become significant for datasets like Ontology (400MB), and quite dramatic for upcoming datasets like Pelias that combine the full OpenStreetMap and Quattroshapes databases (300GB).
With this architecture, we will be able to deploy Pelias once on a given pod, then share it across all tenants of the pod. And knowing that a pod could contain hundreds of servers managed as a cluster and that a server can host about ten tenants, a single copy of the Pelias database could be used by thousands of tenants.
From there, one question that remains to be answered is how we should manage the replication of shared read-only indexes such as Platform, Ontology, or Pelias. Here, a relatively new Elasticsearch feature called Tribe nodes can come to the rescue. A tribe node can be used to consolidate read-only data coming from multiple clusters. But one could potentially use the exact same mechanism to syndicate the content of a single read-only cluster across multiple clusters (aka pods) by deploying one tribe node per cluster.
This is something that we will investigate when we need to automate the management of our various pods (three so far). And as far as Ontology and Pelias are concerned, we will work on deploying these as soon as Jim and Yves are done with more pressing issues. They will allow us to benchmark the platform with datasets that are getting closer to the terabyte range that we are currently targetting for next year.
Following weeks of experimentations, we’ve finally converged on a standard blueprint for provisioning our instances when using Docker. It took us quite a long time because of the sheer number of possible configurations, and because specific configurations require the use of different Docker-related provisioning tools that are still relatively immature.
Nevertheless, this stream of work is finally coming to an end, and Hugues, Jim, and Pascal managed to converge on a standard configuration yesterday. This configuration defines how a STOIC Pod is deployed and configured. The main idea is that all pods will essentially look the same, yet their standard configuration will offer a handful of parameters that we can use to properly size things up according to different requirements. Here is what a pod is made of:
- One or multiple Docker VMs (one by default, currently deployed on AWS)
- One dedicated Docker container per tenant, which includes Node.js and Redis
- One shared Docker container for running a multi-tenant Elasticsearch database
- One shared Docker container for managing upgrades
- One shared Docker container for monitoring the pod using cAdvisor
This architecture provides multiple benefits:
First, by dedicating a full Docker container for every tenant for the middleware components, we provide a very high level of tenant isolation, which increases security and opens the door for the ability to run custom code on the server-side (useful for custom libraries and connectors).
Second, by sharing the Elasticsearch server across multiple tenants, we can reduce our marginal hosting costs per tenant, by deploying a single Java VM and sharing a single Elasticsearch index for all the meta-data related to the Platform across a pool of 10 to 20 tenants (average number of tenants to be deployed per pod). By doing so, we also take advantage of the fact that the JVM provides good support for multi-threading, which is something that Node.js is not really good at. In other words, multi-tenancy at the database level is possible and desirable, while it’s not at the middleware level, hence the proposed architecture.
Third, we can scale things up for a given tenant by increasing the size of the Docker VM on which it is deployed (more CPU, more memory), and by reducing the number of tenants that are deployed on the same pod, all the way down to a single tenant per pod. The latter configuration therefore opens the door to single-tenant pods, without having to change anything to the underlying architecture. It’s also the architecture that would be used for local deployments.
Fourth, things remain really simple initially (one Docker container per tenant), but nothing prevents us from scaling things up down the road in order to gain more scalability and elasticity. Here is what could be done on that front:
- Dedicating a single Elasticsearch server per tenant
- Splitting the middleware container into two containers (one for Node.js, one for Redis)
- Clustering Node.js across multiple Docker containers
- Clustering Redis across multiple Docker containers
- Clustering Elasticsearch across multiple Docker containers
- Using multiple Docker VMs across multiple physical servers
For reference purposes, with this architecture, our barebone hosting costs will be around $5 to $10 per tenant per month. Therefore, our gross margins will always remain above 50%, even if we have a single user per tenant ($25/month for a power user).
This architecture is now up and running in Singapore and will roll out in Ireland next week.
I just tested our new Singapore instances. They’re fast. Almost as fast as a local one… Nice!
After some more discussions, we’ve decided to use three Docker containers for every tenant, one for the Node.js server, one for the Redis queue, and one for the Elasticsearch database. This will make it easier to add support for clustering down the road.
For the next six months or so, we will focus our development efforts on the single-tenant option of our provisioning architecture, with at least one dedicated Node.js server per tenant. When deploying on top of Cloud Foundry, we will share a single Elasticsearch database server across multiple tenants. When deploying on top of Docker, we will dedicate one Elasticsearch database per tenant. Once we have that working well-enough, we will reconsider the multi-tenant approach, but only if we can build a solid business case for it. All hosted customers in the US will be deployed on top of Cloud Foundry, while customers outside of the US will be deployed on Docker using AWS datacenters in Ireland and Singapore. We hope to have a first version of this architecture sometime in September.
We finally managed to upgrade all our instances. Nice…
Jim added HTTPS support to our production clusters.
Thanks to Jim’s work on Docker, we will be able to package development instances that can be easily deployed on a desktop or laptop computer, without having to manually install all the components required by the platform, like the Node.js server, Elasticsearch database, and Redis key-value store. By the end of the month, we’ll give all our partners and past Dojo attendees an instance that they can deploy on their own development machines. This will make it easier for them to give demonstrations to prospect customers for the Fast Track Package.
The overall usability of our user interface is closely related to network latency issues, on two major fronts: first, network latency between client and server; second, network latency between the Node.js server and the Elasticsearch database. Currently, all our servers are deployed within datacenters located in the US, with Node.js running on Pivotal and Elasticsearch running on AWS.
As a result, customers outside of the US have a degraded user experience because of network latency between their clients and our servers. Second, every customer, both inside and outside of the US, have a sub-par user experience because of network latency between our Node.js servers and our Elasticsearch databases.
In order to work around this issue, we’ve decided to change our provisioning architecture to support regional provisioning and server affinity. The former will allow us to deploy instances in places like Singapore (for Asian customers) and Ireland (for European customers). The latter will allow us to run our Node.js servers on the same physical machines or availability zones as our Elasticsearch databases. With these two improvements, network latency should go way down…
To do so, Jim and Hugues are using Docker to package containers for Node.js, Elasticsearch, and Redis. This will allow us to deploy Elasticsearch and Redis close to our Node.js servers. These will continue to be deployed on Pivotal whenever possible, or will run on AWS whenever we need regional provisioning in an area where Pivotal is not available yet.
Using Docker will also make it simpler for customers to deploy local instances. So far, this required either the deployment of a full-blown Cloud Foundry instrastructure, or the manual installation of all the bits and pieces that make a STOIC instance.
We hope to have a first prototype of this architecture sometime during the Summer.