Using json_array_elements() to break out rows with elements of JSON array
I have a table of the following format with just one column. There are around 700 entries in total, here are 5 samples:
{"year":"2021","category":"chemistry","laureates":[{"id": "1002", "firstname": "Benjamin", "surname": "List", "motivation": "\"for the development of asymmetric organocatalysis\"", "share": "2"}, {"id": "1003", "firstname": "David", "surname": "MacMillan", "motivation": "\"for the development of asymmetric organocatalysis\"", "share": "2"}],"reason":null}
{"year":"2021","category":"economics","laureates":[{"id": "1007", "firstname": "David", "surname": "Card", "motivation": "\"for his empirical contributions to labour economics\"", "share": "2"}, {"id": "1008", "firstname": "Joshua", "surname": "Angrist", "motivation": "\"for their methodological contributions to the analysis of causal relationships\"", "share": "4"}, {"id": "1009", "firstname": "Guido", "surname": "Imbens", "motivation": "\"for their methodological contributions to the analysis of causal relationships\"", "share": "4"}],"reason":null}
{"year":"2021","category":"literature","laureates":[{"id": "1004", "firstname": "Abdulrazak", "surname": "Gurnah", "motivation": "\"for his uncompromising and compassionate penetration of the effects of colonialism and the fate of the refugee in the gulf between cultures and continents\"", "share": "1"}],"reason":null}
{"year":"2021","category":"peace","laureates":[{"id": "1005", "firstname": "Maria", "surname": "Ressa", "motivation": "\"for their efforts to safeguard freedom of expression, which is a precondition for democracy and lasting peace\"", "share": "2"}, {"id": "1006", "firstname": "Dmitry", "surname": "Muratov", "motivation": "\"for their efforts to safeguard freedom of expression, which is a precondition for democracy and lasting peace\"", "share": "2"}],"reason":null}
{"year":"2021","category":"physics","laureates":[{"id": "999", "firstname": "Syukuro", "surname": "Manabe", "motivation": "\"for the physical modelling of Earth’s climate, quantifying variability and reliably predicting global warming\"", "share": "4"}, {"id": "1000", "firstname": "Klaus", "surname": "Hasselmann", "motivation": "\"for the physical modelling of Earth’s climate, quantifying variability and reliably predicting global warming\"", "share": "4"}, {"id": "1001", "firstname": "Giorgio", "surname": "Parisi", "motivation": "\"for the discovery of the interplay of disorder and fluctuations in physical systems from atomic to planetary scales\"", "share": "2"}],"reason":"for groundbreaking contributions to our understanding of complex physical systems"}
I want to print the output in this format:
year | category | r |
---|---|---|
"2021" | "chemistry" | {"id": "1002", "firstname": "Benjamin", "surname": "List", "motivation": ""for the development of asymmetric organocatalysis"", "share": "2"} |
"2021" | "chemistry" | {"id": "1003", "firstname": "David", "surname": "MacMillan", "motivation": ""for the development of asymmetric organocatalysis"", "share": "2"} |
"2021" | "economics" | {"id": "1007", "firstname": "David", "surname": "Card", "motivation": ""for his empirical contributions to labour economics"", "share": "2"} |
"2021" | "economics" | {"id": "1008", "firstname": "Joshua", "surname": "Angrist", "motivation": ""for their methodological contributions to the analysis of causal relationships"", "share": "4"} |
"2021" | "economics" | {"id": "1009", "firstname": "Guido", "surname": "Imbens", "motivation": ""for their methodological contributions to the analysis of causal relationships"", "share": "4"} |
I was able to get the output above with the following query:
SELECT tuple->'year' AS year, tuple->'category' AS category,json_array_elements(tuple->'laureates') FROM prizes LIMIT 5;
What I want is to output all values in the table with a query like:
SELECT tuple->'year' AS year, tuple->'category' AS category,json_array_elements(tuple->'laureates') FROM prizes;
When I run this query I get this error:
cannot call json_array_elements on a scalar
I don't understand why this is happening and how I would go about outputting all possible elements.
Since json_array_elements()
can only be used with JSON arrays, split cases and UNION ALL
:
SELECT tuple->'year' AS year, tuple->'category' AS category, json_array_elements(tuple->'laureates') AS r
FROM prizes
WHERE json_typeof(tuple->'laureates') = 'array'
UNION ALL
SELECT tuple->'year' AS year, tuple->'category' AS category, tuple->'laureates'
FROM prizes
WHERE json_typeof(tuple->'laureates') IS DISTINCT FROM 'array';
The manual:
json_typeof
(json
) →text
[...]Returns the type of the top-level JSON value as a text string. Possible types are
object
,array
,string
,number
,boolean
, andnull
. (Thenull
result should not be confused with an SQL NULL; [...])
Note the use of IS DISTINCT FROM
in the second leg. The operator <>
would eliminate rows with (tuple->'laureates') IS NULL
.
jsonb_path_query()
in Postgres 12 or later
With jsonb_path_query()
we don't need to split cases
(like Edouard suggested). But to include rows without laureate
, we need to LEFT JOIN
to it:
SELECT p.id, p.tuple->>'year' AS year, p.tuple->>'category' AS category, l.r
FROM prizes p
LEFT JOIN LATERAL jsonb_path_query((tuple->'laureates')::jsonb, '$[*]', '{"foo":"bar"}') AS l(r) ON true;
db<>fiddle here
See:
- What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
If you are using PostGreSQL version 12 or above, then you can use jsonb_path_query
instead of json_array_elements
like this :
SELECT tuple->>'year'
, tuple->>'category'
, jsonb_path_query((tuple->'laureates') :: jsonb, '$[*]')
FROM prizes
jsonb_path_query
is going to break down tuple->'laureates'
when it is a json array, but is won't fail when tuple->'laureates'
is not a json array.
see the test result in dbfiddle.
see the SQL/JSON Path Language in the manual.