When dealing large data sets that do not fit in memory, it is crucial to limit the number of accesses and iterations over the data set. However, in a high level language it may not be immediately obvious how many iterations a particular program will require. The number of iterations becomes even less obvious in the presence of heuristic and statistics-based optimisations, as used by traditional databases: a small tweak to the query or even modifying the number of rows in a table can cause drastic changes to the query plan.

As data sets continue to grow, a high-level language with predictable runtime characteristics becomes more necessary. Programmers should not need to understand the internal workings of a database or query optimizer in order to write fast queries.

At Ambiata, we have designed and implemented Icicle, a query language specifically for single-pass queries. This means that any query in our language is assured to compile into a single iteration over the data set. We use a type system based on temporal logic to ensure that queries can be executed in a single pass without violating causality. Queries are then compiled to a stream-based intermediate language, which allows multiple queries to be merged together, removing duplicate computations. Finally, queries are compiled to high-performance C code.

In this talk, we will introduce Icicle and some example queries. We will show some example queries that would violate causality, and show how they are outlawed. We will then describe the intermediate language and the fusion system, showing how different queries over the same input data can be merged into a single pass. Finally, we show how using a carefully chosen, restricted intermediate language makes this fusion possible.


Target Audience




schedule Submitted 1 year ago