Parallel unnest() and sort order in PostgreSQL
I understand that using
SELECT unnest(ARRAY[5,3,9]) as id
without an ORDER BY
clause, the order of the result set is not guaranteed. I could for example get:
id
--
3
5
9
But what about the following request:
SELECT
unnest(ARRAY[5,3,9]) as id,
unnest(ARRAY(select generate_series(1, array_length(ARRAY[5,3,9], 1)))) as idx
ORDER BY idx ASC
Is it guaranteed that the 2 unnest()
calls (which have the same length) will unroll in parallel and that the index idx
will indeed match the position of the item in the array?
I am using PostgreSQL 9.3.3.
Yes, that is a feature of Postgres and parallel unnesting is guaranteed to be in sync (as long as all arrays have the same number of elements).
Postgres 9.4 adds a clean solution for parallel unnest:
- Unnest multiple arrays in parallel
The order of resulting rows is not guaranteed, though. Actually, with a statement as simple as:
SELECT unnest(ARRAY[5,3,9]) AS id
the resulting order of rows is "guaranteed", but Postgres does not assert anything. The query optimizer is free to order rows as it sees fit as long as the order is not explicitly defined. This may have side effects in more complex queries.
If the second query in your question is what you actually want (add an index number to unnested array elements), there is a better way with generate_subscripts():
SELECT unnest(ARRAY[5,3,9]) AS id
, generate_subscripts(ARRAY[5,3,9], 1) AS idx
ORDER BY idx;
Details in this related answer:
- How to access array internal index with postgreSQL?
You will be interested in WITH ORDINALITY
in Postgres 9.4:
- PostgreSQL unnest() with element number
Then you can use:
SELECT * FROM unnest(ARRAY[5,3,9]) WITH ORDINALITY tbl(id, idx);
Short answer: No, idx
will not match the array positions, when accepting the premise that unnest()
output may be randomly ordered.
Demo:
since the current implementation of unnest
actually output the rows in the order of elements, I suggest to add a layer on top of it to simulate a random order:
CREATE FUNCTION unnest_random(anyarray) RETURNS setof anyelement
language sql as
$$ select unnest($1) order by random() $$;
Then check out a few executions of your query with unnest
replaced by unnest_random
:
SELECT
unnest_random(ARRAY[5,3,9]) as id,
unnest_random(ARRAY(select generate_series(1, array_length(ARRAY[5,3,9], 1)))) as idx
ORDER BY idx ASC
Example of output:
id | idx ----+----- 3 | 1 9 | 2 5 | 3
id=3
is associated with idx=1
but 3
was in 2nd position in the array. It's all wrong.
What's wrong in the query: it assumes that the first unnest
will shuffle the elements using the same permutation as the second unnest
(permutation in the mathematic sense: the relationship between order in the array and order of the rows). But this assumption contradicts the premise that the order output of unnest
is unpredictable to start with.
About this question:
Is it guaranteed that the 2 unnest() calls (which have the same length) will unroll in parallel
In select unnest(...) X1, unnest(...) X2
, with X1
and X2
being of type SETOF something
and having the same number of rows, X1
and X2
will be paired in the final output so that the X1
value at row N
will face the X2
value at the same row N
.
(it's a kind of UNION for columns, as opposed to a cartesian product).
But I wouldn't describe this pairing as unroll in parallel, so I'm not sure this is what you meant.
Anyway this pairing doesn't help with the problem since it happens after the unnest calls have lost the array positions.
An alternative: In this thread from the pgsql-sql mailing list, this function is suggested:
CREATE OR REPLACE FUNCTION unnest_with_ordinality(anyarray, OUT value
anyelement, OUT ordinality integer)
RETURNS SETOF record AS
$$
SELECT $1[i], i FROM
generate_series(array_lower($1,1),
array_upper($1,1)) i;
$$
LANGUAGE sql IMMUTABLE;
Based on this, we can order by the second output column:
select * from unnest_with_ordinality(array[5,3,9]) order by 2;
value | ordinality -------+------------ 5 | 1 3 | 2 9 | 3
With postgres 9.4 and above: The WITH ORDINALITY
clause that can follow SET RETURNING function calls will provide this functionality in a generic way.