Yesterday, Hugues, Pascal, and I finally agreed on the architecture we should use for our database indices in order to properly implement a smooth meta-data upgrade process. This is something that we’ve been struggling with for over a year now, and a resolution is finally in sight.
To make a long story short, the data and meta-data about any application will be split across two indices, one for its original meta-data, and one for everything else. The first one will be shared across all tenants within an instance, while a copy of the second will be created for every tenant.
While we still do not have a totally clear definition of what meta-data is compared to data, there seems to be an agreement that any object shipped without any record (like Contacts for example) should be considered as data. We’ll rely on this simple assumption for the time being, and add more complexity down the road only if necessary.
The tenant-specific second index will contain the following:
- User data
- Custom meta-data
- Forks of original meta-data records
The third item is the tricky one, and we do not know yet how it will be implemented. One solution would consist in removing forked meta-data records from the first index, but this would prevent us from sharing it across tenants. Another would consist in using terms filters with Elasticsearch.
In essence, any forked meta-data record would be added to a master list on the tenant-specific index, and this list would be used as a terms filter in order to dynamically remove forked records stored on the shared index from queries. That way, forked records would be duplicated across the two indices, but only one copy (the forked one) would be returned by queries.
We’re not sure whether the solution described above will work or not. If it does, it’s awesome, because it will allow us to share indices for application meta-data across any number of tenants, thereby reducing storage requirements on Elasticsearch and dramatically simplifying the upgrade process for the meta-data of applications. If it does not, the overall architecture will still work, but we won’t get to enjoy these two benefits.
With our proposed architecture, we might even create one index per datasource. By default, each and every application is defined with its own datasource, but large applications can have multiple datasources, with one datasource usually being tied to a particular object, or to a collection of objects. For example, if you define an application that would have many records for a particular object, you might decide to package this object within a separate datasource, thereby getting its records stored into a dedicated index.
Another use case would be for applications that make use of connectors to large applications like SAP or Salesforce.com. In this case, you could have one datasource per connector, whereby all records of SAP would be duplicated into a dedicated index, and all records of Salesforce.com into another one. This would make the maintenance of your composite application a lot easier.
Now that we have an agreement on the architecture, it’s time to write some code…
Our next Board meeting is scheduled for March 24th. We’ll try to get the MVP done by then.
Last night, Jim managed to migrate all our Kickstarter instances to paid CloudFoundry instances. If your instance was at
foo.cfapps.io, it should now be available at
foo.stoic.io. We’re using one 1GB Cloud Foundry instance for every 100 tenants, thanks to the multi-tenant architecture developed by Hugues.
This morning, Hugues and Florian are integrating LevelDB and LevelUP into our platform in order to increase the maximum size of the meta-data that can be managed by our distributed meta-data caching layer. Once they’re done, they should be able to integrate this with Pascal’s refactored meta-data cache. We’re almost there…
Victory! Hugues and our friend Kin Wah managed to integrate Nashorn into Elasticsearch.
Great work team Singapore!