Unnesting character string list and converting into wide data
I often have nested lists containing characters string in a tibble. I would like to convert this to long data attempts have been to transform it into wide data using the tidyr
unnest_wider
function, but I am unable to produce my desired results.
Suppose you have a dataframe with a nested character list like the following:
#Producing a list with random character strings and embedding into dataframe.
set.seed(112)
df <- data.frame(id = 1:10)
df$alist <- replicate(10, list(sample(letters[1:5],round(runif(1, min = 1, max = 5)))))
Using tidyr::unnest_wider
almost gets me there, but I would like for the strings to go into the column names.
df %>% unnest_wider(alist, names_sep = "_")
By using two for
loops I´m able to achieve what I want:
df %>% select(alist) %>% unlist(alist) %>% unique() %>% sort() %>% as.list() -> let
df[unlist(let)] <- NA
for(index in let) {
for (i in df$id){
ifelse(index %in% df$alist[[i]],
T,
F) -> df[i,index]
}
}
id | alist | a | b | c | d | e |
---|---|---|---|---|---|---|
1 | b, d, a | TRUE | TRUE | FALSE | TRUE | FALSE |
2 | a, b | TRUE | TRUE | FALSE | FALSE | FALSE |
3 | d, a, c, e | TRUE | FALSE | TRUE | TRUE | TRUE |
4 | d, a, b, c | TRUE | TRUE | TRUE | TRUE | FALSE |
5 | a, c | TRUE | FALSE | TRUE | FALSE | FALSE |
6 | e | FALSE | FALSE | FALSE | FALSE | TRUE |
7 | c | FALSE | FALSE | TRUE | FALSE | FALSE |
8 | c, e, d | FALSE | FALSE | TRUE | TRUE | TRUE |
9 | a, d, e, c, b | TRUE | TRUE | TRUE | TRUE | TRUE |
10 | a | TRUE | FALSE | FALSE | FALSE | FALSE |
But frankly I am looking for a more elegant solution to the problem. I have the sense I am missing something obvious here...
Solution 1:
set.seed(112)
df <- data.frame(id = 1:10)
df$alist <-
replicate(10, list(sample(letters[1:5], round(
runif(1, min = 1, max = 5)
))))
library(tidyverse)
nm <- sort(unique(unlist(df$alist)))
bind_cols(df, map_df(
.x = df$alist,
.f = ~ purrr::set_names(x = nm %in% .x, nm = nm)
))
#> id alist a b c d e
#> 1 1 b, d, a TRUE TRUE FALSE TRUE FALSE
#> 2 2 a, b TRUE TRUE FALSE FALSE FALSE
#> 3 3 d, a, c, e TRUE FALSE TRUE TRUE TRUE
#> 4 4 d, a, b, c TRUE TRUE TRUE TRUE FALSE
#> 5 5 a, c TRUE FALSE TRUE FALSE FALSE
#> 6 6 e FALSE FALSE FALSE FALSE TRUE
#> 7 7 c FALSE FALSE TRUE FALSE FALSE
#> 8 8 c, e, d FALSE FALSE TRUE TRUE TRUE
#> 9 9 a, d, e, c, b TRUE TRUE TRUE TRUE TRUE
#> 10 10 a TRUE FALSE FALSE FALSE FALSE
Created on 2022-01-23 by the reprex package (v2.0.1)