Rax string reduction, /cat()
, on the IMPALA backend was not trivial to implement. The IMPALA documentation states:
GROUP_CONCAT ... does not support the OVER clause, ...
Effectively this means no ORDER BY
on GROUP_CONCAT
in IMPALA. This must be implemented eventually, because the whole GROUP_CONCAT
has limited use without it. Meantime, regexes can be used to put things back in order like so:
regex_replace('annadaveeve','(anna)(dave)(eve)','\3\1\2')
Which would place eve
first, anna
second, and dave
third. The rank number rnum
of each name can be expressed like this:
ROW_NUMBER() OVER(ORDER BY age DESC) AS rnum
The GROUP_CONCAT
from rnum
belonging to (anna)(dave)(eve)
would be 231
, which can be translated to \3\1\2
using regexp_replace
and translate
like so:
regexp_replace(translate('123', '231', '123'), '.', '\\\0')
Giving us the third argument of the sorting regex. The first and second argument are simple. Likewise an SQL SELECT
statement like this:
SELECT parent, GROUP_CONCAT(child, '; ') OVER(ORDER BY age DESC) FROM table GROUP BY parent
could be rewritten in IMPALA like (untested) so:
SELECT parent, REGEXP_REPLACE( GROUP_CONCAT(child, ''), GROUP_CONCAT(CONCAT('(',child,')'), ''), STRLEFT( REGEXP_REPLACE( TRANSLATE( '123456789', GROUP_CONCAT(CAST(rnum AS STRING), ''), '123456789'), '.', '\\\0; '), MAX(rnum) * (2+2) - 2)) FROM ( SELECT *, ROW_NUMBER() OVER(PARTITION BY parent ORDER BY age DESC) AS rnum, FROM table) please GROUP BY parent
Your milage may vary using this code, but if you, like Rax, need a solution right now, try a variant of this code. (You might need to double up the \\
.) Also note that it might be safer to use an escaped version of child
to use in the second argument of the outer regexp_replace
like so:
regexp_replace(child, '[[:punct:]]', '\\\0') AS esc_child
Finally, note, this only works with group sizes below ten. To up that, replace 123456789
by abcdefghijklmnop
(or longer), replace CAST(rnum AS STRING)
with something like substr('abcdefghijklmnop', rnum, 1)
and add an extra regexp_replace in front of strleft
to transform \a\b\c...\o\p
into \1\2\3..\15\16
.