Unlist data frame column preserving information from other column

I have a data frame which consists of two column: a character vector col1 and a list column, col2.

myVector <- c("A","B","C","D")

myList <- list()
myList[[1]] <- c(1, 4, 6, 7)
myList[[2]] <- c(2, 7, 3)
myList[[3]] <- c(5, 5, 3, 9, 6)
myList[[4]] <- c(7, 9)

myDataFrame <- data.frame(row = c(1,2,3,4))

myDataFrame$col1 <- myVector
myDataFrame$col2 <- myList

myDataFrame
# row col1          col2
# 1   1    A    1, 4, 6, 7
# 2   2    B       2, 7, 3
# 3   3    C 5, 5, 3, 9, 6
# 4   4    D          7, 9

I want to unlist my col2 still keeping for each element of the vectors in the list the information stored in col1. To phrase it differently, in commonly used data frame reshape terminology: the "wide" list column should be converted to a "long" format.

Then at the end of the day I want two vectors of length equal to length(unlist(myDataFrame$col2)). In code:

# unlist myList
unlist.col2 <- unlist(myDataFrame$col2)
unlist.col2
# [1] 1 4 6 7 2 7 3 5 5 3 9 6 7 9

# unlist myVector to obtain
# unlist.col1 <- ???
# unlist.col1
# [1] A A A A B B B C C C C C D D

I can't think of any straightforward way to get it.

You may also use unnest from package tidyr:

library(tidyr)
unnest(myDataFrame, col2)

#      row  col1  col2
#    (dbl) (chr) (dbl)
# 1      1     A     1
# 2      1     A     4
# 3      1     A     6
# 4      1     A     7
# 5      2     B     2
# 6      2     B     7
# 7      2     B     3
# 8      3     C     5
# 9      3     C     5
# 10     3     C     3
# 11     3     C     9
# 12     3     C     6
# 13     4     D     7
# 14     4     D     9

You can use the "data.table" to expand the whole data.frame, and extract the column of interest.

library(data.table)
## expand the entire data.frame (uncomment to see)
# as.data.table(myDataFrame)[, unlist(col2), by = list(row, col1)]

## expand and select the column of interest:
as.data.table(myDataFrame)[, unlist(col2), by = list(row, col1)]$col1
#  [1] "A" "A" "A" "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D"

In newer versions of R, you can now use the lengths function instead of the sapply(list, length) approach. The lengths function is considerably faster.

with(myDataFrame, rep(col1, lengths(col2)))
#  [1] "A" "A" "A" "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D"

Unlist data frame column preserving information from other column

Related

Recent Posts