Combine (rbind) data frames and create column with name of original data frames
I have several data frames that I want to combine by row. In the resulting single data frame, I want to create a new variable identifying which data set the observation came from.
# original data frames
df1 <- data.frame(x = c(1, 3), y = c(2, 4))
df2 <- data.frame(x = c(5, 7), y = c(6, 8))
# desired, combined data frame
df3 <- data.frame(x = c(1, 3, 5, 7), y = c(2, 4, 6, 8),
source = c("df1", "df1", "df2", "df2")
# x y source
# 1 2 df1
# 3 4 df1
# 5 6 df2
# 7 8 df2
How can I achieve this? Thanks in advance!
Solution 1:
It's not exactly what you asked for, but it's pretty close. Put your objects in a named list and use do.call(rbind...)
> do.call(rbind, list(df1 = df1, df2 = df2))
x y
df1.1 1 2
df1.2 3 4
df2.1 5 6
df2.2 7 8
Notice that the row names now reflect the source data.frame
s.
Update: Use cbind
and rbind
Another option is to make a basic function like the following:
AppendMe <- function(dfNames) {
do.call(rbind, lapply(dfNames, function(x) {
cbind(get(x), source = x)
}))
}
This function then takes a character vector of the data.frame
names that you want to "stack", as follows:
> AppendMe(c("df1", "df2"))
x y source
1 1 2 df1
2 3 4 df1
3 5 6 df2
4 7 8 df2
Update 2: Use combine
from the "gdata" package
> library(gdata)
> combine(df1, df2)
x y source
1 1 2 df1
2 3 4 df1
3 5 6 df2
4 7 8 df2
Update 3: Use rbindlist
from "data.table"
Another approach that can be used now is to use rbindlist
from "data.table" and its idcol
argument. With that, the approach could be:
> rbindlist(mget(ls(pattern = "df\\d+")), idcol = TRUE)
.id x y
1: df1 1 2
2: df1 3 4
3: df2 5 6
4: df2 7 8
Update 4: use map_df
from "purrr"
Similar to rbindlist
, you can also use map_df
from "purrr" with I
or c
as the function to apply to each list element.
> mget(ls(pattern = "df\\d+")) %>% map_df(I, .id = "src")
Source: local data frame [4 x 3]
src x y
(chr) (int) (int)
1 df1 1 2
2 df1 3 4
3 df2 5 6
4 df2 7 8
Solution 2:
Another approach using dplyr
:
df1 <- data.frame(x = c(1,3), y = c(2,4))
df2 <- data.frame(x = c(5,7), y = c(6,8))
df3 <- dplyr::bind_rows(list(df1=df1, df2=df2), .id = 'source')
df3
Source: local data frame [4 x 3]
source x y
(chr) (dbl) (dbl)
1 df1 1 2
2 df1 3 4
3 df2 5 6
4 df2 7 8
Solution 3:
I'm not sure if such a function already exists, but this seems to do the trick:
bindAndSource <- function(df1, df2) {
df1$source <- as.character(match.call())[[2]]
df2$source <- as.character(match.call())[[3]]
rbind(df1, df2)
}
results:
bindAndSource(df1, df2)
1 1 2 df1
2 3 4 df1
3 5 6 df2
4 7 8 df2
Caveat: This will not work in *aply
-like calls