Is there something like a zip() function in PostgreSQL that combines two arrays?

I have two array values of the same length in PostgreSQL:

{a,b,c} and {d,e,f}

and I'd like to combine them into

{{a,d},{b,e},{c,f}}

Is there a way to do that?

Postgres 9.5 or later

has array_agg(array expression):

array_agg ( anyarray ) → anyarray

Concatenates all the input arrays into an array of one higher dimension. (The inputs must all have the same dimensionality, and cannot be empty or null.)

This is a drop-in replacement for my custom aggregate function array_agg_mult() demonstrated below. It's implemented in C and considerably faster. Use it.

Postgres 9.4

Use the ROWS FROM construct or the updated unnest() which takes multiple arrays to unnest in parallel. Each can have a different length. You get (per documentation):

[...] the number of result rows in this case is that of the largest function result, with smaller results padded with null values to match.

Use this cleaner and simpler variant:

SELECT ARRAY[a,b] AS ab
FROM   unnest('{a,b,c}'::text[] 
            , '{d,e,f}'::text[]) x(a,b);

Postgres 9.3 or older

Simple zip()

Consider the following demo for Postgres 9.3 or earlier:

SELECT ARRAY[a,b] AS ab
FROM  (
   SELECT unnest('{a,b,c}'::text[]) AS a
        , unnest('{d,e,f}'::text[]) AS b
    ) x;

Result:

  ab
-------
 {a,d}
 {b,e}
 {c,f}

Note that both arrays must have the same number of elements to unnest in parallel, or you get a cross join instead.

You can wrap this into a function, if you want to:

CREATE OR REPLACE FUNCTION zip(anyarray, anyarray)
  RETURNS SETOF anyarray LANGUAGE SQL AS
$func$
SELECT ARRAY[a,b] FROM (SELECT unnest($1) AS a, unnest($2) AS b) x;
$func$;

Call:

SELECT zip('{a,b,c}'::text[],'{d,e,f}'::text[]);

Same result.

zip() to multi-dimensional array:

Now, if you want to aggregate that new set of arrays into one 2-dimenstional array, it gets more complicated.

SELECT ARRAY (SELECT ...)

or:

SELECT array_agg(ARRAY[a,b]) AS ab
FROM  (
   SELECT unnest('{a,b,c}'::text[]) AS a
         ,unnest('{d,e,f}'::text[]) AS b
    ) x

or:

SELECT array_agg(ARRAY[ARRAY[a,b]]) AS ab
FROM  ...

will all result in the same error message (tested with pg 9.1.5):

ERROR: could not find array type for data type text[]

But there is a way around this, as we worked out under this closely related question.
Create a custom aggregate function:

CREATE AGGREGATE array_agg_mult (anyarray) (
   SFUNC    = array_cat
 , STYPE    = anyarray
 , INITCOND = '{}'
);

And use it like this:

SELECT array_agg_mult(ARRAY[ARRAY[a,b]]) AS ab
FROM  (
   SELECT unnest('{a,b,c}'::text[]) AS a
        , unnest('{d,e,f}'::text[]) AS b
    ) x

Result:

{{a,d},{b,e},{c,f}}

Note the additional ARRAY[] layer! Without it and just:

SELECT array_agg_mult(ARRAY[a,b]) AS ab
FROM ...

You get:

{a,d,b,e,c,f}

Which may be useful for other purposes.

Roll another function:

CREATE OR REPLACE FUNCTION zip2(anyarray, anyarray)
  RETURNS SETOF anyarray LANGUAGE SQL AS
$func$
SELECT array_agg_mult(ARRAY[ARRAY[a,b]])
FROM (SELECT unnest($1) AS a, unnest($2) AS b) x;
$func$;

Call:

SELECT zip2('{a,b,c}'::text[],'{d,e,f}'::text[]); -- or any other array type

Result:

{{a,d},{b,e},{c,f}}

Here's another approach that's safe for arrays of differing lengths, using the array multi-aggregation mentioned by Erwin:

CREATE OR REPLACE FUNCTION zip(array1 anyarray, array2 anyarray) RETURNS text[]
AS $$
SELECT array_agg_mult(ARRAY[ARRAY[array1[i],array2[i]]])
FROM generate_subscripts(
  CASE WHEN array_length(array1,1) >= array_length(array2,1) THEN array1 ELSE array2 END,
  1
) AS subscripts(i)
$$ LANGUAGE sql;

regress=> SELECT zip('{a,b,c}'::text[],'{d,e,f}'::text[]);
         zip         
---------------------
 {{a,d},{b,e},{c,f}}
(1 row)


regress=> SELECT zip('{a,b,c}'::text[],'{d,e,f,g}'::text[]);
             zip              
------------------------------
 {{a,d},{b,e},{c,f},{NULL,g}}
(1 row)

regress=> SELECT zip('{a,b,c,z}'::text[],'{d,e,f}'::text[]);
             zip              
------------------------------
 {{a,d},{b,e},{c,f},{z,NULL}}
(1 row)

If you want to chop off the excess rather than null-padding, just change the >= length test to <= instead.

This function does not handle the rather bizarre PostgreSQL feature that arrays may have a stating element other than 1, but in practice nobody actually uses that feature. Eg with a zero-indexed 3-element array:

regress=> SELECT zip('{a,b,c}'::text[], array_fill('z'::text, ARRAY[3], ARRAY[0]));
          zip           
------------------------
 {{a,z},{b,z},{c,NULL}}
(1 row)

wheras Erwin's code does work with such arrays, and even with multi-dimensional arrays (by flattening them) but does not work with arrays of differing length.

Arrays are a bit special in PostgreSQL, they're a little too flexible with multi-dimensional arrays, configurable origin index, etc.

In 9.4 you'll be able to write:

SELECT array_agg_mult(ARRAY[ARRAY[a,b])
FROM unnest(array1) WITH ORDINALITY as (o,a)
NATURAL FULL OUTER JOIN
unnest(array2) WITH ORDINALITY as (o,b);

which will be a lot nicer, especially if an optimisation to scan the functions together rather than doing a sort and join goes in.

Is there something like a zip() function in PostgreSQL that combines two arrays?

Postgres 9.5 or later

Postgres 9.4

Postgres 9.3 or older

Simple zip()

zip() to multi-dimensional array:

Related

Recent Posts