How to convert a huge list-of-vector to a matrix more efficiently?
I have a list of length 130,000 where each element is a character vector of length 110. I would like to convert this list to a matrix with dimension 1,430,000*10. How can I do it more efficiently?\ My code is :
output=NULL
for(i in 1:length(z)) {
output=rbind(output,
matrix(z[[i]],ncol=10,byrow=TRUE))
}
This should be equivalent to your current code, only a lot faster:
output <- matrix(unlist(z), ncol = 10, byrow = TRUE)
I think you want
output <- do.call(rbind,lapply(z,matrix,ncol=10,byrow=TRUE))
i.e. combining @BlueMagister's use of do.call(rbind,...)
with an lapply
statement to convert the individual list elements into 11*10 matrices ...
Benchmarks (showing @flodel's unlist
solution is 5x faster than mine, and 230x faster than the original approach ...)
n <- 1000
z <- replicate(n,matrix(1:110,ncol=10,byrow=TRUE),simplify=FALSE)
library(rbenchmark)
origfn <- function(z) {
output <- NULL
for(i in 1:length(z))
output<- rbind(output,matrix(z[[i]],ncol=10,byrow=TRUE))
}
rbindfn <- function(z) do.call(rbind,lapply(z,matrix,ncol=10,byrow=TRUE))
unlistfn <- function(z) matrix(unlist(z), ncol = 10, byrow = TRUE)
## test replications elapsed relative user.self sys.self
## 1 origfn(z) 100 36.467 230.804 34.834 1.540
## 2 rbindfn(z) 100 0.713 4.513 0.708 0.012
## 3 unlistfn(z) 100 0.158 1.000 0.144 0.008
If this scales appropriately (i.e. you don't run into memory problems), the full problem would take about 130*0.2 seconds = 26 seconds on a comparable machine (I did this on a 2-year-old MacBook Pro).
It would help to have sample information about your output. Recursively using rbind
on bigger and bigger things is not recommended. My first guess at something that would help you:
z <- list(1:3,4:6,7:9)
do.call(rbind,z)
See a related question for more efficiency, if needed.
You can also use,
output <- as.matrix(as.data.frame(z))
The memory usage is very similar to
output <- matrix(unlist(z), ncol = 10, byrow = TRUE)
Which can be verified, with mem_changed()
from library(pryr)
.