Count of distinct strings in comma separated list (string)?
For example let's say I've got a table as such:
PrimaryKey | List |
---|---|
1 | thing, stuff |
2 | thing |
3 | stuff, doodad, thing |
4 | stuff, thing |
Where each value in the 'List' column is a string with words separated by a comma. I'd like to get the count of each word that appears in that column so that I end up with this:
Word | Count |
---|---|
thing | 4 |
stuff | 3 |
doodad | 1 |
I've seen a lot of really similar questions, but can't seem to figure it out. Any help is much appreciated!
Solution 1:
You can use the SPLIT_TO_TABLE function:
Sample data:
CREATE OR REPLACE TABLE ST (PrimaryKey INT, List STRING);
INSERT INTO ST
SELECT * FROM VALUES
(1, 'thing, stuff'),
(2, 'thing'),
(3, 'stuff, doodad, thing'),
(4, 'stuff, thing') AS t(PrimaryKey, List);
Solution:
SELECT TRIM(L.VALUE) AS Word, COUNT(TRIM(L.VALUE)) AS Count
FROM ST,
LATERAL SPLIT_TO_TABLE(ST.List, ',') AS L
GROUP BY TRIM(L.VALUE);
Reference: SPLIT_TO_TABLE
Solution 2:
try this,
create table test_1 as (
SELECT fld1 FROM (
SELECT 'thing, stuff' fld1 UNION ALL
SELECT 'thing' fld1 UNION ALL
SELECT 'stuff, doodad, thing' fld1 UNION ALL
SELECT 'stuff, thing' fld1
)
);
select count(*), fld1 from (
SELECT trim(value) as fld1 FROM test_1, lateral split_to_Table(test_1.fld1,',')
order by value
) group by fld1
Solution 3:
Both the above answers show how to use SPLIT_TO_TABLE() which is great -> you can also use strtok_split_to_table() This function allows multiple delimiters (SPLIT_TO_TABLE allows just one) which is pretty cool (making it easier in the real world).
I changed some of the commas to a hyphens so you can see it still work.
Copy|Paste|Run:
with a as (SELECT * FROM VALUES
(1, 'thing, stuff'),
(2, 'thing'),
(3, 'stuff, doodad - thing'),
(4, 'stuff- thing') AS t(pk, l))
select
value::string answer,count(1)
from
a
,lateral strtok_split_to_table(a.l,(', |-' ))
group by
answer