Converting rows into columns based on the values in the rows, in R
This is the table I am working with. I want to convert reservation
into separate columns.
I want it to be transformed into something like this. I have been trying to do this using reshape2
and dplyr
's separate
but I didn't find a solution.
You may try
library(tidyverse)
df %>%
rowwise %>%
mutate(reservation_main = str_split(reservation,'_' ,simplify = T)[1],
reservation_no = paste0('_',str_split(reservation,'_' ,simplify = T)[2])) %>%
select(id, response_id, reservation_main, reservation_no) %>%
pivot_wider(names_from = reservation_no, values_from = response_id)
id reservation_main `_1` `_2` `_3`
<dbl> <chr> <dbl> <dbl> <dbl>
1 31100 A 1 1 0
2 31100 B 1 1 0
3 31100 C 1 0 0
Since reservation
is in a consistent format, we can use the _
to separate into two columns. Then, we can convert the response
to 0 and 1. Then, I drop response_id
. Finally, I pivot to the wide format. I'm assuming that you don't want the _
before the numbers in the columns.
library(tidyverse)
df %>%
separate(reservation, c("reservation", "number"), sep = "_") %>%
mutate(response = ifelse(response == "yes", 1, 0)) %>%
select(-response_id) %>%
pivot_wider(names_from = "number", values_from = "response")
Output
# A tibble: 2 × 5
id reservation `1` `2` `3`
<dbl> <chr> <dbl> <dbl> <dbl>
1 31100 A 1 1 0
2 31100 B 1 1 0
If you do want to keep the _
in front of the numbers for the columns, then we could adjust the regex in separate
.
df %>%
separate(reservation, c("reservation", "number"), sep = "(?=\\_)") %>%
mutate(response = ifelse(response == "yes", 1, 0)) %>%
select(-response_id) %>%
pivot_wider(names_from = "number", values_from = "response")
# A tibble: 2 × 5
id reservation `_1` `_2` `_3`
<dbl> <chr> <dbl> <dbl> <dbl>
1 31100 A 1 1 0
2 31100 B 1 1 0
Data
df <- structure(
list(
id = c(31100, 31100, 31100, 31100, 31100, 31100),
reservation = c("A_1", "A_2", "A_3", "B_1", "B_2", "B_3"),
response = c("yes", "yes", "no", "yes", "yes", "no"),
response_id = c(1,
1, 0, 1, 1, 0)
),
class = "data.frame",
row.names = c(NA,-6L)
)