My name is Ismael Chang Ghalimi. I build the STOIC platform. I am a stoic, and this blog is my agora.

Victory! After a bit of fighting, we now have a working API for our pivot table, available both through a function call and through a FormulaJS query. Now, we need to add bi-dimensional hierarchical rollups, and we’ll be good to go.

Victory! Datavore and Hexastore now share the very same query interface, using a set of custom FormulaJS functions. As a result, you can query the cube using the full power of FormulaJS, including combining multiple Datavore and Hexastore queries into a single FormulaJS expression. And as far as my benchmarks can tell, ExpressionJS is not adding any noticeable overhead past the first invocation.
Victory! Datavore and Hexastore now share the very same query interface, using a set of custom FormulaJS functions. As a result, you can query the cube using the full power of FormulaJS, including combining multiple Datavore and Hexastore queries into a single FormulaJS expression. And as far as my benchmarks can tell, ExpressionJS is not adding any noticeable overhead past the first invocation.
Victory! Datavore and Hexastore now share the very same query interface, using a set of custom FormulaJS functions. As a result, you can query the cube using the full power of FormulaJS, including combining multiple Datavore and Hexastore queries into a single FormulaJS expression. And as far as my benchmarks can tell, ExpressionJS is not adding any noticeable overhead past the first invocation.

Victory! Datavore and Hexastore now share the very same query interface, using a set of custom FormulaJS functions. As a result, you can query the cube using the full power of FormulaJS, including combining multiple Datavore and Hexastore queries into a single FormulaJS expression. And as far as my benchmarks can tell, ExpressionJS is not adding any noticeable overhead past the first invocation.

We need to build a query language for HypercubeJS. And the task is made a bit more complex by the fact that it needs to support multiple indexes (Crossfilter, Datavore, Hexastore, etc.). So, how do we go about that? And most importantly, which syntax do we use for it? And how do we make sure that we don’t have to write yet another parser for yet another language? Well, in pure STOIC fashion, we use our good old ExpressionJS, and we wrap all our index-specific query functions into nice and shiny FormulaJS functions. Et voilà ! Meet GPARENT, which is the function that returns the parent of a record of an object defined with a hierarchy.
We need to build a query language for HypercubeJS. And the task is made a bit more complex by the fact that it needs to support multiple indexes (Crossfilter, Datavore, Hexastore, etc.). So, how do we go about that? And most importantly, which syntax do we use for it? And how do we make sure that we don’t have to write yet another parser for yet another language? Well, in pure STOIC fashion, we use our good old ExpressionJS, and we wrap all our index-specific query functions into nice and shiny FormulaJS functions. Et voilà ! Meet GPARENT, which is the function that returns the parent of a record of an object defined with a hierarchy.

We need to build a query language for HypercubeJS. And the task is made a bit more complex by the fact that it needs to support multiple indexes (Crossfilter, Datavore, Hexastore, etc.). So, how do we go about that? And most importantly, which syntax do we use for it? And how do we make sure that we don’t have to write yet another parser for yet another language? Well, in pure STOIC fashion, we use our good old ExpressionJS, and we wrap all our index-specific query functions into nice and shiny FormulaJS functions. Et voilà ! Meet GPARENT, which is the function that returns the parent of a record of an object defined with a hierarchy.

HypercubeJS integrated

Earlier today, Yves and François integrated HypercubeJS into core.stoic and connected the new Pivot perspective to it. We now need to finish our query interface, and we’ll be able to produce our nice and shiny pivot tables. That, combined with native C++ indexes should give us quite a few bangs for our bucks.

Going native

After some basic benchmarking, Yves and I have decided to migrate the indexes offered by HypercubeJS (Crossfilter, Datavore, Hexastore, and Humanizer) from JavaScript to native C++. This will break the memory barrier of Node.js (1.4GB), and make our indexes a lot faster. From there, we might even move some of these indexes to straight C.

What’s interesting in this exercise is the communication overhead that you incur when going from JavaScript to C++. Before starting, I had no idea what that overhead might be. So, we made a very simple benchmark consisting in calling a function called hello() that would return the world string. In the first case, the string is a variable of the function. In the other, it’s a value returned by an external C++ library. Results? 300ns for the former (n as nano), and 400ns for the latter (on average, for 10,000,000 calls).

Conclusion: externalizing a function increases the latency of it’s invocation by about 30%, which is very reasonable. In exchange, you get virtually unlimited memory, and you get native C speed. How much faster will our indexes be as a result? I’m willing to bet anywhere between 2 to 5 times, and we should get a first answer for Hexastore sometime tomorrow…

We have externalized our graph traversal functions for easier maintenance, and we’ve added a few more to our arsenal, while improving some existing ones. First, ascendants() and descendants() can now take an optional generations parameter, which defines the number of generations that need to be traversed when going through the hierarchy. Second, we’ve added an ascendant() function which returns a single ascendant according to a certain number of generations. When the latter parameter is omitted, the root ascendant is returned. Now, we’ll work on shortest path computation function by implementing the usual A* and Dijkstra algorithms, and possibly a few more. Fun stuff…
We have externalized our graph traversal functions for easier maintenance, and we’ve added a few more to our arsenal, while improving some existing ones. First, ascendants() and descendants() can now take an optional generations parameter, which defines the number of generations that need to be traversed when going through the hierarchy. Second, we’ve added an ascendant() function which returns a single ascendant according to a certain number of generations. When the latter parameter is omitted, the root ascendant is returned. Now, we’ll work on shortest path computation function by implementing the usual A* and Dijkstra algorithms, and possibly a few more. Fun stuff…

We have externalized our graph traversal functions for easier maintenance, and we’ve added a few more to our arsenal, while improving some existing ones. First, ascendants() and descendants() can now take an optional generations parameter, which defines the number of generations that need to be traversed when going through the hierarchy. Second, we’ve added an ascendant() function which returns a single ascendant according to a certain number of generations. When the latter parameter is omitted, the root ascendant is returned. Now, we’ll work on shortest path computation function by implementing the usual A* and Dijkstra algorithms, and possibly a few more. Fun stuff…

Check this out: we’re starting to be able to make some really interesting requests onto our graph structures using Hexastore. Through the addition of a handful of functions, we can now process the following queries really, realy fast:
parent() to get the parent of a record
children() to get the children of a record
ascendants() to get the ascendants of a record
descendants() to get the descendants of a record
depth() to get how deep a record is within a hierarchy
height() to get how high a record is within a hierarchy
Now, what’s interesting is how fast these queries are. For example, when we run the same height() queries three or four times (enough for the JavaScript optimizer to kick in), it takes less than 1 millisecond, while it need to visit the 371 children of the Singapore region. What this means is that looking up a node in the graph takes less than 3 microsecond. That’s to be compared with the 10 millisecond that a typical ElasticSearch query would take. In other words, for graph navigation, we’re about 3,000 times faster than the core database. Not bad, not bad at all…
Check this out: we’re starting to be able to make some really interesting requests onto our graph structures using Hexastore. Through the addition of a handful of functions, we can now process the following queries really, realy fast:
parent() to get the parent of a record
children() to get the children of a record
ascendants() to get the ascendants of a record
descendants() to get the descendants of a record
depth() to get how deep a record is within a hierarchy
height() to get how high a record is within a hierarchy
Now, what’s interesting is how fast these queries are. For example, when we run the same height() queries three or four times (enough for the JavaScript optimizer to kick in), it takes less than 1 millisecond, while it need to visit the 371 children of the Singapore region. What this means is that looking up a node in the graph takes less than 3 microsecond. That’s to be compared with the 10 millisecond that a typical ElasticSearch query would take. In other words, for graph navigation, we’re about 3,000 times faster than the core database. Not bad, not bad at all…

Check this out: we’re starting to be able to make some really interesting requests onto our graph structures using Hexastore. Through the addition of a handful of functions, we can now process the following queries really, realy fast:

  • parent() to get the parent of a record
  • children() to get the children of a record
  • ascendants() to get the ascendants of a record
  • descendants() to get the descendants of a record
  • depth() to get how deep a record is within a hierarchy
  • height() to get how high a record is within a hierarchy

Now, what’s interesting is how fast these queries are. For example, when we run the same height() queries three or four times (enough for the JavaScript optimizer to kick in), it takes less than 1 millisecond, while it need to visit the 371 children of the Singapore region. What this means is that looking up a node in the graph takes less than 3 microsecond. That’s to be compared with the 10 millisecond that a typical ElasticSearch query would take. In other words, for graph navigation, we’re about 3,000 times faster than the core database. Not bad, not bad at all…

Refactored CSV importer

Importing large datasets through Excel spreadsheets is usually not a good idea, because there is no liberally-licensed open source parser for XLSX files that supports streaming. As a result, you’re limited by the amount of memory that Node.js can handle, which is about 1.4GB. Therefore, XLSX imports should be limited to relatively small sample datasets that are used to bootstrap your application. From there, another importer should be used.

For this purpose, Jacques-Alexandre has been working on a high-performance CSV importer implemented as a connector. As a result, it will by-pass all the complex business logic implemented in the existing importer, connect directly to our database, and support streaming. This should allow us to import very large datasets with an optimal level of performance. We hope to have a first version of this sometime later this week, or early next week.

This is still very, very early, but here are first screenshots of STOIC Sheets with the ability to repeat rows (or columns) according to a formula. Here, we’re showing the records of the Datatypes object on multiple rows. It’s still pretty slow because our first implementation is quite naive, but it’s a great proof of concept. Now, we’ll make it real fast.
This is still very, very early, but here are first screenshots of STOIC Sheets with the ability to repeat rows (or columns) according to a formula. Here, we’re showing the records of the Datatypes object on multiple rows. It’s still pretty slow because our first implementation is quite naive, but it’s a great proof of concept. Now, we’ll make it real fast.
This is still very, very early, but here are first screenshots of STOIC Sheets with the ability to repeat rows (or columns) according to a formula. Here, we’re showing the records of the Datatypes object on multiple rows. It’s still pretty slow because our first implementation is quite naive, but it’s a great proof of concept. Now, we’ll make it real fast.
This is still very, very early, but here are first screenshots of STOIC Sheets with the ability to repeat rows (or columns) according to a formula. Here, we’re showing the records of the Datatypes object on multiple rows. It’s still pretty slow because our first implementation is quite naive, but it’s a great proof of concept. Now, we’ll make it real fast.

This is still very, very early, but here are first screenshots of STOIC Sheets with the ability to repeat rows (or columns) according to a formula. Here, we’re showing the records of the Datatypes object on multiple rows. It’s still pretty slow because our first implementation is quite naive, but it’s a great proof of concept. Now, we’ll make it real fast.

We’re now adding the fields of a view to the Humanizer index, and we have new parent() and children() functions that return the parent and children of a record according to any hierarchical field. You can see the raw results on top, and the humanized results at the bottom. I’m starting to really like this Hexastore graph index…
We’re now adding the fields of a view to the Humanizer index, and we have new parent() and children() functions that return the parent and children of a record according to any hierarchical field. You can see the raw results on top, and the humanized results at the bottom. I’m starting to really like this Hexastore graph index…

We’re now adding the fields of a view to the Humanizer index, and we have new parent() and children() functions that return the parent and children of a record according to any hierarchical field. You can see the raw results on top, and the humanized results at the bottom. I’m starting to really like this Hexastore graph index…