Count of distinct strings in comma separated list (string)?

For example let's say I've got a table as such:

PrimaryKey List
1 thing, stuff
2 thing
3 stuff, doodad, thing
4 stuff, thing

Where each value in the 'List' column is a string with words separated by a comma. I'd like to get the count of each word that appears in that column so that I end up with this:

Word Count
thing 4
stuff 3
doodad 1

I've seen a lot of really similar questions, but can't seem to figure it out. Any help is much appreciated!


Solution 1:

You can use the SPLIT_TO_TABLE function:

Sample data:

CREATE OR REPLACE TABLE ST (PrimaryKey INT, List STRING);
  
INSERT INTO ST  
SELECT * FROM VALUES
(1, 'thing, stuff'),
(2, 'thing'),
(3, 'stuff, doodad, thing'),
(4, 'stuff, thing') AS t(PrimaryKey, List);

Solution:

SELECT TRIM(L.VALUE) AS Word, COUNT(TRIM(L.VALUE)) AS Count
  FROM ST,
  LATERAL SPLIT_TO_TABLE(ST.List, ',') AS L
 GROUP BY TRIM(L.VALUE);

Reference: SPLIT_TO_TABLE

Solution 2:

try this,

create table test_1 as (       
 SELECT fld1 FROM (
 SELECT 'thing, stuff' fld1 UNION ALL
 SELECT 'thing' fld1 UNION ALL
 SELECT 'stuff, doodad, thing' fld1 UNION ALL
 SELECT 'stuff, thing' fld1 
   )
   );
   
select count(*), fld1 from (    
 SELECT  trim(value) as fld1 FROM test_1,   lateral split_to_Table(test_1.fld1,',') 
  order by value
) group by fld1

Solution 3:

Both the above answers show how to use SPLIT_TO_TABLE() which is great -> you can also use strtok_split_to_table() This function allows multiple delimiters (SPLIT_TO_TABLE allows just one) which is pretty cool (making it easier in the real world).

I changed some of the commas to a hyphens so you can see it still work.

enter image description here

Copy|Paste|Run:

with a as (SELECT * FROM VALUES
(1, 'thing, stuff'),
(2, 'thing'),
(3, 'stuff, doodad - thing'), 
(4, 'stuff- thing') AS t(pk, l))

select 
    value::string answer,count(1) 
from 
     a 
    ,lateral strtok_split_to_table(a.l,(', |-' ))  
group by 
     answer