Hive : How to flatten an array?
Solution 1:
You can also try to use concat_ws
on top of collect_list
/ collect_set
to flatten the collected data.
select split(concat_ws(',',collect_list(concat_ws(',',pe.v1))),',') from dum lateral view explode(val) pe as k1,v1 group by val
Output:
Solution 2:
There's actually already a brickhouse UDF for this, array_flatten
:
https://github.com/klout/brickhouse/blob/master/src/main/java/brickhouse/udf/collect/ArrayFlattenUDF.java
hive (default)> select array_flatten(array(array('a', 'b'), array('b', 'c')));
OK
["a","b","b","c"]
Time taken: 0.302 seconds, Fetched: 1 row(s)
A bit janky, but you can also de-dupe using another brickhouse UDF, array_union
, and just pass an additional empty array:
hive (default)> select array_union(array_flatten(array(array('a', 'b'), array('b', 'c'))), array());
OK
["a","b","c"]
Time taken: 0.245 seconds, Fetched: 1 row(s)