Sessionization with Temporal Union

A common problem in web analytics is sessionization or session reconstruction: taking a series of user events and splitting the series into a set of sessions [1]. Often, it is done based solely on the timestamps of the events: a session ends after a certain time of user inactivity. As an example, let’s take the following click stream:

userId page timestamp
0001 “index.html” 2015-05-10 20:13:13
0001 “search.html” 2015-05-10 20:13:20
0001 “search.html” 2015-05-10 20:19:01
0001 “checkout.html” 2015-05-10 20:25:20
0002 “index.html” 2015-05-10 21:03:31
0002 “search.html” 2015-05-10 21:04:20
0001 “index.html” 2015-05-10 24:25:25
0001 “search.html” 2015-05-10 24:25:30

If we assume that a session ends after at least 15 minutes of user inactivity, the sessionized version of this data would look like this:

userId session_begin session_end
0001 2015-05-10 20:13:13 2015-05-10 20:25:20
0002 2015-05-10 21:03:31 2015-05-10 21:04:20
0001 2015-05-10 24:25:25 2015-05-10 24:25:30

So how would you solve it in SQL? The solution to a similar problem, Packing date and time intervals, was posted here. Packing of intervals means that you want to merge all intervals that overlap into one contiguous interval problem. We can transform our original sessionization problem into interval-packing problem by turning every click event into a 15-minute interval. If two such intervals overlap, it means that the original click events were less than 15 minutes apart. So packing such intervals, and then substracting 15 minutes from each row in the result, will give us the sessions. Here is an SQL query that does that:

WITH ExtendedClicks AS
(
  SELECT userId, timestamp as starttime, DATEADD(MINUTE, 15, timestamp) as endtime
  FROM Clicks
),
C1 AS
(
  SELECT userId, starttime AS ts, +1 AS type, NULL AS e,
    ROW_NUMBER() OVER(PARTITION BY userId ORDER BY starttime) AS s
  FROM ExtendedClicks
  UNION ALL
  SELECT userId, endtime AS ts, -1 AS type,
    ROW_NUMBER() OVER(PARTITION BY userId ORDER BY endtime) AS e,
    NULL AS s
  FROM ExtendedClicks
),
C2 AS
(
  SELECT C1.*, ROW_NUMBER() OVER(PARTITION BY userId ORDER BY ts, type DESC) AS se
  FROM C1
),
C3 AS
(
  SELECT userId, ts,
    FLOOR((ROW_NUMBER() OVER(PARTITION BY userId ORDER BY ts) - 1) / 2 + 1) AS grpnum
  FROM C2
  WHERE COALESCE(s - (se - s) - 1, (se - e) - e) = 0
),
ExtendedSessions AS
(
  SELECT userId, MIN(ts) AS starttime, MAX(ts) AS endtime
  FROM C3
  GROUP BY userId, grpnum;
)
SELECT userID, starttime, DATEADD(MINUTE, -15, endtime) as endtime
FROM ExtendedSessions

In Rax, we introduced the temporal union operator, @\/ which solves the interval-packing problem in one line of code. The sessionization problem in Rax:

{[#:userId, $:page, |:interval]}: ExtendedClicks :=
  project [.userId, .page, (|)[.timestamp, .timestamp + (^)"PT15M"]]
  Clicks;
{[#:userId, |:interval]}: ExtendedSessions :=
  project [.userId, .#1]
  @\/:[.userId] ExtendedClicks;
{[#:userId, |:interval]}: Sessions :=
  project [.userId, (|)[.interval.begin, .interval.end - (^)"PT15M"]]
  ExtendedSessions;

Which code do you prefer? Why?