Converting nested list to dataframe

The goal is to convert a nested list which sometimes contain missing records into a data frame. An example of the structure when there are missing records is:

str(mylist)

List of 3
 $ :List of 7
  ..$ Hit    : chr "True"
  ..$ Project: chr "Blue"
  ..$ Year   : chr "2011"
  ..$ Rating : chr "4"
  ..$ Launch : chr "26 Jan 2012"
  ..$ ID     : chr "19"
  ..$ Dept   : chr "1, 2, 4"
 $ :List of 2
  ..$ Hit  : chr "False"
  ..$ Error: chr "Record not found"
 $ :List of 7
  ..$ Hit    : chr "True"
  ..$ Project: chr "Green"
  ..$ Year   : chr "2004"
  ..$ Rating : chr "8"
  ..$ Launch : chr "29 Feb 2004"
  ..$ ID     : chr "183"
  ..$ Dept   : chr "6, 8"

When there are no missing records the list can be converted into a data frame using data.frame(do.call(rbind.data.frame, mylist)). However, when records are missing this results in a column mismatch. I know there are functions to merge data frames of non-matching columns but I'm yet to find one that can be applied to lists. The ideal outcome would keep record 2 with NA for all variables. Hoping for some help.

Edit to add dput(mylist):

list(structure(list(Hit = "True", Project = "Blue", Year = "2011", 
Rating = "4", Launch = "26 Jan 2012", ID = "19", Dept = "1, 2, 4"), .Names = c("Hit", 
"Project", "Year", "Rating", "Launch", "ID", "Dept")), structure(list(
Hit = "False", Error = "Record not found"), .Names = c("Hit", 
"Error")), structure(list(Hit = "True", Project = "Green", Year = "2004", 
Rating = "8", Launch = "29 Feb 2004", ID = "183", Dept = "6, 8"), .Names = c("Hit", 
"Project", "Year", "Rating", "Launch", "ID", "Dept")))

You can also use (at least v1.9.3) of rbindlist in the data.table package:

library(data.table)

rbindlist(mylist, fill=TRUE)

##      Hit Project Year Rating      Launch  ID    Dept            Error
## 1:  True    Blue 2011      4 26 Jan 2012  19 1, 2, 4               NA
## 2: False      NA   NA     NA          NA  NA      NA Record not found
## 3:  True   Green 2004      8 29 Feb 2004 183    6, 8               NA

You could create a list of data.frames:

dfs <- lapply(mylist, data.frame, stringsAsFactors = FALSE)

Then use one of these:

library(plyr)
rbind.fill(dfs)

or the faster

library(dplyr)
bind_rows(dfs) # in earlier versions: rbind_all(dfs)

In the case of dplyr::bind_rows, I am surprised that it chooses to use "" instead of NA for missing data. If you remove stringsAsFactors = FALSE, you will get NA but at the cost of a warning... So suppressWarnings(rbind_all(lapply(mylist, data.frame))) would be an ugly but fast solution.

Converting nested list to dataframe

Related

Recent Posts