Return elements of Redshift JSON array on separate rows

I have a Redshift table that looks like this:

 id | metadata
---------------------------------------------------------------------------
 1  | [{"pet":"dog"},{"country":"uk"}]
 2  | [{"pet":"cat"}]
 3  | []
 4  | [{"country":"germany"},{"education":"masters"},{"country":"belgium"}]

All array elements have just one field.
There is no guarantee that a particular field will feature in any of an array's elements.
A field name can be repeated in an array
The array elements can be in any order

I am wanting to get back a table that looks like this:

 id |   field   |  value
------------------------
 1  | pet       | dog
 1  | country   | uk
 2  | pet       | cat
 4  | country   | germany
 4  | education | masters
 4  | country   | belgium

I can then combine this with my queries on the rest of the input table.

I have tried playing around with the Redshift JSON functions, but without being able to write functions/use loops/have variables in Redshift, I really can't see a way to do this!

Please let me know if I can clarify anything else.

Thanks to this inspired blog post, I've been able to craft a solution. This is:

Create a look-up table to effectively 'iterate' over the elements of each array. The number of rows in this table has be equal to or greater than the maximum number of elements of arrays. Let's say this is 4 (it can be calculated using SELECT MAX(JSON_ARRAY_LENGTH(metadata)) FROM input_table):
```
CREATE VIEW seq_0_to_3 AS
    SELECT 0 AS i UNION ALL                                      
    SELECT 1 UNION ALL
    SELECT 2 UNION ALL    
    SELECT 3          
);
```

From this, we can create one row per JSON element:

WITH exploded_array AS (                                                                          
    SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json
    FROM input_table, seq_0_to_3 AS seq
    WHERE seq.i < JSON_ARRAY_LENGTH(metadata)
  )
SELECT *
FROM exploded_array;

Producing:

 id | json
------------------------------
 1  | {"pet":"dog"}
 1  | {"country":"uk"}
 2  | {"pet":"cat"}
 4  | {"country":"germany"}
 4  | {"education":"masters"}
 4  | {"country":"belgium"}

However, I was needing to extract the field names/values. As I can't see any way to extract JSON field names using Redshift's limited functions, I'll do this using a regular expression:

WITH exploded_array AS (                                                                                       
    SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json
    FROM input_table, seq_0_to_3 AS seq
    WHERE seq.i < JSON_ARRAY_LENGTH(metadata)
)
SELECT id, field, JSON_EXTRACT_PATH_TEXT(json, field)
FROM (
    SELECT id, json, REGEXP_SUBSTR(json, '[^{"]\\w+[^"]') AS field
    FROM exploded_array
);

There is generic version for CREATE VIEW seq_0_to_3. Let's call it CREATE VIEW seq_0_to_n. This can be generated by

CREATE VIEW seq_0_to_n AS (  
    SELECT row_number() over (
                          ORDER BY TRUE)::integer - 1 AS i
    FROM <insert_large_enough_table> LIMIT <number_less_than_table_entries>);

This helps in generating large sequences as a view.

Return elements of Redshift JSON array on separate rows

Related

Recent Posts