After some more discussions, we’ve decided to use three Docker containers for every tenant, one for the Node.js server, one for the Redis queue, and one for the Elasticsearch database. This will make it easier to add support for clustering down the road.
For the next six months or so, we will focus our development efforts on the single-tenant option of our provisioning architecture, with at least one dedicated Node.js server per tenant. When deploying on top of Cloud Foundry, we will share a single Elasticsearch database server across multiple tenants. When deploying on top of Docker, we will dedicate one Elasticsearch database per tenant. Once we have that working well-enough, we will reconsider the multi-tenant approach, but only if we can build a solid business case for it. All hosted customers in the US will be deployed on top of Cloud Foundry, while customers outside of the US will be deployed on Docker using AWS datacenters in Ireland and Singapore. We hope to have a first version of this architecture sometime in September.
We finally managed to upgrade all our instances. Nice…
Jim added HTTPS support to our production clusters.
Thanks to Jim’s work on Docker, we will be able to package development instances that can be easily deployed on a desktop or laptop computer, without having to manually install all the components required by the platform, like the Node.js server, Elasticsearch database, and Redis key-value store. By the end of the month, we’ll give all our partners and past Dojo attendees an instance that they can deploy on their own development machines. This will make it easier for them to give demonstrations to prospect customers for the Fast Track Package.
The overall usability of our user interface is closely related to network latency issues, on two major fronts: first, network latency between client and server; second, network latency between the Node.js server and the Elasticsearch database. Currently, all our servers are deployed within datacenters located in the US, with Node.js running on Pivotal and Elasticsearch running on AWS.
As a result, customers outside of the US have a degraded user experience because of network latency between their clients and our servers. Second, every customer, both inside and outside of the US, have a sub-par user experience because of network latency between our Node.js servers and our Elasticsearch databases.
In order to work around this issue, we’ve decided to change our provisioning architecture to support regional provisioning and server affinity. The former will allow us to deploy instances in places like Singapore (for Asian customers) and Ireland (for European customers). The latter will allow us to run our Node.js servers on the same physical machines or availability zones as our Elasticsearch databases. With these two improvements, network latency should go way down…
To do so, Jim and Hugues are using Docker to package containers for Node.js, Elasticsearch, and Redis. This will allow us to deploy Elasticsearch and Redis close to our Node.js servers. These will continue to be deployed on Pivotal whenever possible, or will run on AWS whenever we need regional provisioning in an area where Pivotal is not available yet.
Using Docker will also make it simpler for customers to deploy local instances. So far, this required either the deployment of a full-blown Cloud Foundry instrastructure, or the manual installation of all the bits and pieces that make a STOIC instance.
We hope to have a first prototype of this architecture sometime during the Summer.
Jim is currently testing multiple deployment configurations in order to improve the performance of our platform. One thing we’re testing is the ability to deploy our middleware in the same AWS datacenter and availability zone as our database. The former is running on top of Cloud Foundry, which is itself hosted on AWS, while the other is directly deployed on top of EC2 and S3. Having both in the same datacenter would reduce network latency from a few dozens of milliseconds down to about 1ms. And having both within the same availability zone would bring it down to about 0.2ms. We’re also looking at migrating from Redis Cloud to ElastiCache, which would both improve performance and reduce costs. With a bit of luck, all these provisioning details should settle down sometime next week.
As many of you have experienced over the past few days, Kickstarter instances are very unstable right now. This is due to the fact that we migrated to a production version of Cloud Foundry and are still making adjustments to the system’s configuration. Also, spreadsheet import is not working yet, and won’t be before next week. In the meantime, we’re doing some testing on various platforms in order to improve performance. Early results are very encouraging, but we still need a few days to get our ducks in a row.
That’s for the bad news. The good news is that things are rapidly converging, and problems are being solved one after the other. And while half of the team is working on caching, provisioning, and upgrades, the other half is fixing bugs and plugging holes left and right. Conclusion: we’ll end up being a couple of months late in regards to our original schedule, but the end result should be a lot more complete and a lot more stable, with a lot less technical debt in it. When we had to pick two out of features, quality, and timeframe, we picked the first two. But time has come to ship now, and it will happen this month.
Yesterday, Hugues, Pascal, and I finally agreed on the architecture we should use for our database indices in order to properly implement a smooth meta-data upgrade process. This is something that we’ve been struggling with for over a year now, and a resolution is finally in sight.
To make a long story short, the data and meta-data about any application will be split across two indices, one for its original meta-data, and one for everything else. The first one will be shared across all tenants within an instance, while a copy of the second will be created for every tenant.
While we still do not have a totally clear definition of what meta-data is compared to data, there seems to be an agreement that any object shipped without any record (like Contacts for example) should be considered as data. We’ll rely on this simple assumption for the time being, and add more complexity down the road only if necessary.
The tenant-specific second index will contain the following:
- User data
- Custom meta-data
- Forks of original meta-data records
The third item is the tricky one, and we do not know yet how it will be implemented. One solution would consist in removing forked meta-data records from the first index, but this would prevent us from sharing it across tenants. Another would consist in using terms filters with Elasticsearch.
In essence, any forked meta-data record would be added to a master list on the tenant-specific index, and this list would be used as a terms filter in order to dynamically remove forked records stored on the shared index from queries. That way, forked records would be duplicated across the two indices, but only one copy (the forked one) would be returned by queries.
We’re not sure whether the solution described above will work or not. If it does, it’s awesome, because it will allow us to share indices for application meta-data across any number of tenants, thereby reducing storage requirements on Elasticsearch and dramatically simplifying the upgrade process for the meta-data of applications. If it does not, the overall architecture will still work, but we won’t get to enjoy these two benefits.
With our proposed architecture, we might even create one index per datasource. By default, each and every application is defined with its own datasource, but large applications can have multiple datasources, with one datasource usually being tied to a particular object, or to a collection of objects. For example, if you define an application that would have many records for a particular object, you might decide to package this object within a separate datasource, thereby getting its records stored into a dedicated index.
Another use case would be for applications that make use of connectors to large applications like SAP or Salesforce.com. In this case, you could have one datasource per connector, whereby all records of SAP would be duplicated into a dedicated index, and all records of Salesforce.com into another one. This would make the maintenance of your composite application a lot easier.
Now that we have an agreement on the architecture, it’s time to write some code…
Last night, Jim managed to migrate all our Kickstarter instances to paid CloudFoundry instances. If your instance was at
foo.cfapps.io, it should now be available at
foo.stoic.io. We’re using one 1GB Cloud Foundry instance for every 100 tenants, thanks to the multi-tenant architecture developed by Hugues.
If we’re lucky, all Kickstarter instances will be moved from cfapps.io to stoic.io.
The trial instances of Cloud Foundry that we’ve been using for our Kickstarter backers up until now will soon expire. It’s time that we move to production instances, and Jim has taken the lead on this project. He will build upon the foundation laid by Hugues and implement a simple provisioning framework that will allow us to manage a handful of multi-tenant instances for our 300 beta customers. Upgrades will take place within the next two weeks.