tidyverse pivot_longer several sets of columns, but avoid intermediate mutate_wider steps [duplicate]
I have the following data
dat <- data.frame(id = c("A", "B", "C"),
Q1r1_pepsi = c(1,0,1),
Q1r1_cola = c(0,0,1),
Q1r2_pepsi = c(1,1,1),
Q1r2_cola = c(0,1,1),
stringsAsFactors = FALSE)
where Q1r1 and Q1r2 are rating questions in a survey and pepsi and cola are the brands being rated. So I have two ratings (r1 and r2) for two brands (pepsi, cola):
id Q1r1_c1 Q1r1_c2 Q1r2_c1 Q1r2_c2
"A" 1 0 1 0
"B" 0 0 1 1
"C" 1 1 1 1
(Side question: how do I format a SO post so that it correctly contains the nicely formatted output that I would get when calling dat
in the R Console?)
To analyze the data I need to reshape (pivot) the data so that rows indicate unique rating-brand pairs. Thus, the expected outcome would be:
id brand Q1r1 Q1r2
"A" "pepsi" 1 1
"A" "cola" 0 0
"B" "pepsi" 0 1
"B" "cola" 0 1
"C" "pepsi" 1 1
"C" "cola" 1 1
Currently, I always do a combination of pivot_longer
and pivot_wider
, but I was hoping that I can directly get this result by pivoting_longer without doing the intermediate step:
library(tidyverse)
dat_long <- dat %>%
pivot_longer(cols = starts_with("Q1")) %>%
separate(name, into = c("item", "brand"), remove = FALSE)
dat_wide <- dat_long %>%
pivot_wider(id_cols = c(id, brand),
names_from = item,
values_from = value)
With this current example it's still ok to do this intermediate step, but it gets tiresome in other less clean examples, e.g. suppose my columns weren't named in a nice structure with Q1r1_c1, Q1r1_c2, Q1r2_c1, Q1r2_c2
, but instead would be Q4, Q5, Q8r1, Q8r2
where the map would be between Q4 and Q8r1, and Q5/Q8r2, respectively.
You can use :
tidyr::pivot_longer(dat, cols = -id,
names_to = c('.value', 'brand'),
names_sep = "_")
# id brand Q1r1 Q1r2
# <chr> <chr> <dbl> <dbl>
#1 A pepsi 1 1
#2 A cola 0 0
#3 B pepsi 0 1
#4 B cola 0 1
#5 C pepsi 1 1
#6 C cola 1 1
Following @Ronak Shah's suggestion, I'm pasting my code here in case my column names are less structured. I'm building up this code on the dat
data I specified in my initial post.
names(dat) <- c("id", "Q4", "Q5", "Q8r1", "Q8r2")
spec <- data.frame(.name = names(dat)[-1],
.value = c("Q4Q5", "Q4Q5", "Q8", "Q8"),
brand = rep(c("pepsi", "cola"), 2),
stringsAsFactors = FALSE)
dat_long <- pivot_longer_spec(dat, spec)
This gives essentially the same result as for the data with my structured names, just that the names are different now.
Again, I'm not entirely sure how generalizable this approach is, but it worked in my case.