Hive : How to flatten an array?

Solution 1:

You can also try to use concat_ws on top of collect_list / collect_set to flatten the collected data.

select split(concat_ws(',',collect_list(concat_ws(',',pe.v1))),',') from dum lateral view explode(val) pe as k1,v1 group by val

Output:

enter image description here

Solution 2:

There's actually already a brickhouse UDF for this, array_flatten: https://github.com/klout/brickhouse/blob/master/src/main/java/brickhouse/udf/collect/ArrayFlattenUDF.java

hive (default)> select array_flatten(array(array('a', 'b'), array('b', 'c')));
OK
["a","b","b","c"]
Time taken: 0.302 seconds, Fetched: 1 row(s)

A bit janky, but you can also de-dupe using another brickhouse UDF, array_union, and just pass an additional empty array:

hive (default)> select array_union(array_flatten(array(array('a', 'b'), array('b', 'c'))), array());
OK
["a","b","c"]
Time taken: 0.245 seconds, Fetched: 1 row(s)