How can I aggregate over the best rated words?

Using data.table, we can split the 'words', unlist the column, while replicate the 'rating' based on the lengths, get the mean of 'rating' by 'words', paste the 'words', by the rank and then order the 'rank'

library(data.table)
dt[, words := strsplit(words, ",\\s+")]
dt[, .(rating = rep(rating, lengths(words)), 
   words = unlist(words))][, mean(rating), words][,
    .(words = toString(words)), .(rank = frank(-V1,
        ties.method = "dense"))][order(rank)]

-output

   rank        words
1:    1 sushi, fries
2:    2         wine
3:    3        steak
4:    4        salad
5:    5        bread
6:    6         rice

The tidyverse equivalent of the above code would be

library(dplyr)
library(tidyr)
dt %>% 
 separate_rows(words) %>% 
 group_by(words) %>% 
 summarise(rating = mean(rating, na.rm = TRUE)) %>% 
 group_by(rating = dense_rank(-rating)) %>% 
 summarise(words = toString(words))

-output

# A tibble: 6 × 2
  rating words       
   <int> <chr>       
1      1 fries, sushi
2      2 wine        
3      3 steak       
4      4 salad       
5      5 bread       
6      6 rice

How can I aggregate over the best rated words?

Related

Recent Posts