Is there something like a zip() function in PostgreSQL that combines two arrays?
I have two array values of the same length in PostgreSQL:
{a,b,c}
and {d,e,f}
and I'd like to combine them into
{{a,d},{b,e},{c,f}}
Is there a way to do that?
Postgres 9.5 or later
has array_agg(array expression)
:
array_agg
(anyarray
) →anyarray
Concatenates all the input arrays into an array of one higher dimension. (The inputs must all have the same dimensionality, and cannot be empty or null.)
This is a drop-in replacement for my custom aggregate function array_agg_mult()
demonstrated below. It's implemented in C and considerably faster. Use it.
Postgres 9.4
Use the ROWS FROM
construct or the updated unnest()
which takes multiple arrays to unnest in parallel. Each can have a different length. You get (per documentation):
[...] the number of result rows in this case is that of the largest function result, with smaller results padded with null values to match.
Use this cleaner and simpler variant:
SELECT ARRAY[a,b] AS ab
FROM unnest('{a,b,c}'::text[]
, '{d,e,f}'::text[]) x(a,b);
Postgres 9.3 or older
Simple zip()
Consider the following demo for Postgres 9.3 or earlier:
SELECT ARRAY[a,b] AS ab
FROM (
SELECT unnest('{a,b,c}'::text[]) AS a
, unnest('{d,e,f}'::text[]) AS b
) x;
Result:
ab
-------
{a,d}
{b,e}
{c,f}
Note that both arrays must have the same number of elements to unnest in parallel, or you get a cross join instead.
You can wrap this into a function, if you want to:
CREATE OR REPLACE FUNCTION zip(anyarray, anyarray)
RETURNS SETOF anyarray LANGUAGE SQL AS
$func$
SELECT ARRAY[a,b] FROM (SELECT unnest($1) AS a, unnest($2) AS b) x;
$func$;
Call:
SELECT zip('{a,b,c}'::text[],'{d,e,f}'::text[]);
Same result.
zip() to multi-dimensional array:
Now, if you want to aggregate that new set of arrays into one 2-dimenstional array, it gets more complicated.
SELECT ARRAY (SELECT ...)
or:
SELECT array_agg(ARRAY[a,b]) AS ab
FROM (
SELECT unnest('{a,b,c}'::text[]) AS a
,unnest('{d,e,f}'::text[]) AS b
) x
or:
SELECT array_agg(ARRAY[ARRAY[a,b]]) AS ab
FROM ...
will all result in the same error message (tested with pg 9.1.5):
ERROR: could not find array type for data type text[]
But there is a way around this, as we worked out under this closely related question.
Create a custom aggregate function:
CREATE AGGREGATE array_agg_mult (anyarray) (
SFUNC = array_cat
, STYPE = anyarray
, INITCOND = '{}'
);
And use it like this:
SELECT array_agg_mult(ARRAY[ARRAY[a,b]]) AS ab
FROM (
SELECT unnest('{a,b,c}'::text[]) AS a
, unnest('{d,e,f}'::text[]) AS b
) x
Result:
{{a,d},{b,e},{c,f}}
Note the additional ARRAY[]
layer! Without it and just:
SELECT array_agg_mult(ARRAY[a,b]) AS ab
FROM ...
You get:
{a,d,b,e,c,f}
Which may be useful for other purposes.
Roll another function:
CREATE OR REPLACE FUNCTION zip2(anyarray, anyarray)
RETURNS SETOF anyarray LANGUAGE SQL AS
$func$
SELECT array_agg_mult(ARRAY[ARRAY[a,b]])
FROM (SELECT unnest($1) AS a, unnest($2) AS b) x;
$func$;
Call:
SELECT zip2('{a,b,c}'::text[],'{d,e,f}'::text[]); -- or any other array type
Result:
{{a,d},{b,e},{c,f}}
Here's another approach that's safe for arrays of differing lengths, using the array multi-aggregation mentioned by Erwin:
CREATE OR REPLACE FUNCTION zip(array1 anyarray, array2 anyarray) RETURNS text[]
AS $$
SELECT array_agg_mult(ARRAY[ARRAY[array1[i],array2[i]]])
FROM generate_subscripts(
CASE WHEN array_length(array1,1) >= array_length(array2,1) THEN array1 ELSE array2 END,
1
) AS subscripts(i)
$$ LANGUAGE sql;
regress=> SELECT zip('{a,b,c}'::text[],'{d,e,f}'::text[]);
zip
---------------------
{{a,d},{b,e},{c,f}}
(1 row)
regress=> SELECT zip('{a,b,c}'::text[],'{d,e,f,g}'::text[]);
zip
------------------------------
{{a,d},{b,e},{c,f},{NULL,g}}
(1 row)
regress=> SELECT zip('{a,b,c,z}'::text[],'{d,e,f}'::text[]);
zip
------------------------------
{{a,d},{b,e},{c,f},{z,NULL}}
(1 row)
If you want to chop off the excess rather than null-padding, just change the >=
length test to <=
instead.
This function does not handle the rather bizarre PostgreSQL feature that arrays may have a stating element other than 1, but in practice nobody actually uses that feature. Eg with a zero-indexed 3-element array:
regress=> SELECT zip('{a,b,c}'::text[], array_fill('z'::text, ARRAY[3], ARRAY[0]));
zip
------------------------
{{a,z},{b,z},{c,NULL}}
(1 row)
wheras Erwin's code does work with such arrays, and even with multi-dimensional arrays (by flattening them) but does not work with arrays of differing length.
Arrays are a bit special in PostgreSQL, they're a little too flexible with multi-dimensional arrays, configurable origin index, etc.
In 9.4 you'll be able to write:
SELECT array_agg_mult(ARRAY[ARRAY[a,b])
FROM unnest(array1) WITH ORDINALITY as (o,a)
NATURAL FULL OUTER JOIN
unnest(array2) WITH ORDINALITY as (o,b);
which will be a lot nicer, especially if an optimisation to scan the functions together rather than doing a sort and join goes in.