Find difference between elements of arrays with hardcoded array in Bigquery
One of my tables has a column for json, which is an array of key-value objects (about 40). The order of the keys in the array is the same for the records in the table. That is, the i-th elements of different records will have the same key. The value of such objects can be of different types: number, string, null, array of strings, etc.
I have a hardcoded json with the same structure as this array, like [{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]
I want to compare this array with array-column of all records in the table element by element. And display only the elements of the array that have no matching values.
That is, if some record has an array equal to [{"key": "key 1", "value": "not equal value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]
, then for this case the difference will be [{"key": "key 1", "value": "not equal value 1"}]
, key2 and key3 are skipped, because their value is equal.
So for such a data sample
id | json | ...
----------
1 |[{"key": "key 1", "value": "not equal value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]
----------
2 |[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "not equal value 2"}, {"key": "key 3", "value": "value 3"}]
----------
3 |[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "not equal value 2"}, {"key": "key 3", "value": "value 3"}]
----------
4 |[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]
----------
5 |[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]
I expect the result
id | json
----------
1 |[{"key": "key 1", "value": "not equal value 1"}]
----------
2 |[{"key": "key 2", "value" : "not equal value 2"}]
----------
3 |[{"key": "key 2", "value" : "not equal value 2"}]
----------
4 |[]
----------
5 |[]
I also want to make a query that will group the result above by key and count their number. That is, it will make it clear which key values differ the most and least of all from a hardcoded array.
key | count
--------------
key 2 | 2
key 1 | 1
It is not clear what you actually have array col or json/string col - so I am using whatever data sample you provided - which is a json.
... display only the elements of the array that have no matching values.
with your_table as (
select 1 id, '[{"key": "key 1", "value": "not equal value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]' json union all
select 2, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "not equal value 2"}, {"key": "key 3", "value": "value 3"}]' union all
select 3, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "not equal value 2"}, {"key": "key 3", "value": "value 3"}]' union all
select 4, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]' union all
select 5, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]'
), search as (
select '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]' json
)
select id,
array(
select t_element
from unnest(json_extract_array(t.json)) t_element, search s
left join unnest(json_extract_array(s.json)) s_element
on t_element = s_element
where s_element is null
) arr
from your_table t
with output
... I also want to make a query that will group the result above by key and count their number.
with your_table as (
select 1 id, '[{"key": "key 1", "value": "not equal value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]' json union all
select 2, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "not equal value 2"}, {"key": "key 3", "value": "value 3"}]' union all
select 3, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "not equal value 2"}, {"key": "key 3", "value": "value 3"}]' union all
select 4, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]' union all
select 5, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]'
), search as (
select '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]' json
)
select key, count(*) counts
from (
select id,
array(
select json_extract_scalar(t_element, '$.key')
from unnest(json_extract_array(t.json)) t_element, search s
left join unnest(json_extract_array(s.json)) s_element
on t_element = s_element
where s_element is null
) keys
from your_table t
), unnest(keys) key
group by key
with output