Sessionization with Temporal Union

Sessionization with Temporal Union
By | 2018-01-17T16:30:01+00:00 May 13th, 2015|Categories: Coding, Rax|

A common problem in web analytics is sessionization or session reconstruction: taking a series of user events and splitting the series into a set of sessions [1]. Often, it is done based solely on the timestamps of the events: a session ends after a certain time of user inactivity. As an example, let’s take the following click stream:

userId page timestamp
0001 “index.html” 2015-05-10 20:13:13
0001 “search.html” 2015-05-10 20:13:20
0001 “search.html” 2015-05-10 20:19:01
0001 “checkout.html” 2015-05-10 20:25:20
0002 “index.html” 2015-05-10 21:03:31
0002 “search.html” 2015-05-10 21:04:20
0001 “index.html” 2015-05-10 24:25:25
0001 “search.html” 2015-05-10 24:25:30

If we assume that a session ends after at least 15 minutes of user inactivity, the sessionized version of this data would look like this:

userId session_begin session_end
0001 2015-05-10 20:13:13 2015-05-10 20:25:20
0002 2015-05-10 21:03:31 2015-05-10 21:04:20
0001 2015-05-10 24:25:25 2015-05-10 24:25:30

So how would you solve it in SQL? The solution to a similar problem, Packing date and time intervals, was posted here. Packing of intervals means that you want to merge all intervals that overlap into one contiguous interval problem. We can transform our original sessionization problem into interval-packing problem by turning every click event into a 15-minute interval. If two such intervals overlap, it means that the original click events were less than 15 minutes apart. So packing such intervals, and then substracting 15 minutes from each row in the result, will give us the sessions. Here is an SQL query that does that:

In Rax, we introduced the temporal union operator, @\/ which solves the interval-packing problem in one line of code. The sessionization problem in Rax:

Which code do you prefer? Why?