How can I aggregate over the best rated words?
Using data.table
, we can split the 'words', unlist
the column, while rep
licate the 'rating' based on the lengths
, get the mean
of 'rating' by 'words', paste
the 'words', by the rank
and then order
the 'rank'
library(data.table)
dt[, words := strsplit(words, ",\\s+")]
dt[, .(rating = rep(rating, lengths(words)),
words = unlist(words))][, mean(rating), words][,
.(words = toString(words)), .(rank = frank(-V1,
ties.method = "dense"))][order(rank)]
-output
rank words
1: 1 sushi, fries
2: 2 wine
3: 3 steak
4: 4 salad
5: 5 bread
6: 6 rice
The tidyverse
equivalent of the above code would be
library(dplyr)
library(tidyr)
dt %>%
separate_rows(words) %>%
group_by(words) %>%
summarise(rating = mean(rating, na.rm = TRUE)) %>%
group_by(rating = dense_rank(-rating)) %>%
summarise(words = toString(words))
-output
# A tibble: 6 × 2
rating words
<int> <chr>
1 1 fries, sushi
2 2 wine
3 3 steak
4 4 salad
5 5 bread
6 6 rice