Sort by Regex

Rax string reduction, /cat(), on the IMPALA backend was not trivial to implement. The IMPALA documentation states: GROUP_CONCAT … does not support the OVER clause, … Effectively this means no ORDER BY on GROUP_CONCAT in IMPALA. This must be implemented eventually, because the whole GROUP_CONCAT has limited use without it. Meantime, regexes can be used …

Welcome Snowflake

Snowflake Computing has recently emerged from stealth with a bold claim of having reinvented the data warehouse. Ease of use is their main motto and I dare to say they live up to this promise. Since one of Snowflake’s co-founders, Marcin Żukowski, is a good friend of mine, I’ve got a chance to play with …

Rax/MySQL vs. Rax/Azure vs. Rax/Redshift

Our SQL-backend family is contantly growing. Rax could already connect to SQLite, MySQL and PostgreSQL databases. Now we have also ported Rax to two major cloud databases: Microsoft Azure and AWS Redshift. The port to Azure gave us some headache due to problems with their ODBC driver for Linux. The port to Redshift was straightforward, …

Understanding Time

With behavioral data, time plays a very important role. Yet, time-related data is especially hard to analyze. The main culprit is the fact that time concepts are confusing. While humans can handle time intuitively, passing these intuitions to a computer program is hard. For example, implementing a growth rate using a relative duration of “twelve …

Kinda Like Assert

In 1983 my math teacher wrote on the blackboard: P{Q}R. Ever since, my code is loaded with assertions. “Assertions should be used to document logically impossible situations and discover programming errors […] This is distinct from error handling: most error conditions are possible, although some may be extremely unlikely“. (Wikipedia) [Assertion] We all have seen, …

Pointer To Pointer

Some hold that pointers in a language like C are dangerous and hard to master. Both maybe true, but so are scalpels. Since you are reading this, you know that pointer practice makes pointer perfect. The same goes for pointers to pointers. I remember vividly when I first saw some elegant code that used a …

Skip List

The skip list is a relative unknown data structure. If you know it already you can stop reading. If not, keep reading because “the algorithms for insertion and deletion in skip lists are much simpler and significantly faster than equivalent algorithms for balanced trees.” (William Pugh) [Skip lists: a probabilistic alternative to balanced trees]. Speaking …