Create a Data Frame of Unequal Lengths
While data frame columns must have the same number rows, is there any way to create a data frame of unequal lengths. I'm not interested in saving them as separate elements of a list because I often have to to email people this info as a csv file, and this is easiest as a data frame.
x = c(rep("one",2))
y = c(rep("two",10))
z = c(rep("three",5))
cbind(x,y,z)
In the above code, the cbind()
function just recycles the shorter columns so that they all have 10 elements in each column. How can I alter it just so that lengths are 2, 10, and 5.
I've done this in the past by doing the following, but it's inefficient.
df = data.frame(one=c(rep("one",2),rep("",8)),
two=c(rep("two",10)), three=c(rep("three",5), rep("",5)))
Sorry this isn't exactly what you asked, but I think there may be another way to get what you want.
First, if the vectors are different lengths, the data isn't really tabular, is it? How about just save it to different CSV files? You might also try ascii formats that allow storing multiple objects (json, XML).
If you feel the data really is tabular, you could pad on NAs:
> x = 1:5
> y = 1:12
> max.len = max(length(x), length(y))
> x = c(x, rep(NA, max.len - length(x)))
> y = c(y, rep(NA, max.len - length(y)))
> x
[1] 1 2 3 4 5 NA NA NA NA NA NA NA
> y
[1] 1 2 3 4 5 6 7 8 9 10 11 12
If you absolutely must make a data.frame
with unequal columns you could subvert the check, at your own peril:
> x = 1:5
> y = 1:12
> df = list(x=x, y=y)
> attributes(df) = list(names = names(df),
row.names=1:max(length(x), length(y)), class='data.frame')
> df
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 <NA> 6
7 <NA> 7
[ reached getOption("max.print") -- omitted 5 rows ]]
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
corrupt data frame: columns will be truncated or padded with NAs
Another approach to the padding:
na.pad <- function(x,len){
x[1:len]
}
makePaddedDataFrame <- function(l,...){
maxlen <- max(sapply(l,length))
data.frame(lapply(l,na.pad,len=maxlen),...)
}
x = c(rep("one",2))
y = c(rep("two",10))
z = c(rep("three",5))
makePaddedDataFrame(list(x=x,y=y,z=z))
The na.pad()
function exploits the fact that R will automatically pad a vector with NAs if you try to index non-existent elements.
makePaddedDataFrame()
just finds the longest one and pads the rest up to a matching length.
To amplify @goodside's answer, you can do something like
L <- list(x,y,z)
cfun <- function(L) {
pad.na <- function(x,len) {
c(x,rep(NA,len-length(x)))
}
maxlen <- max(sapply(L,length))
do.call(data.frame,lapply(L,pad.na,len=maxlen))
}
cfun(L)