Merge Panel data to get balanced panel data
There's a function for that. Combine the data frames with rbind
. Then use complete
. It will look through the groups in variable
and fill any with missing values:
library(tidyr)
df3 <- do.call(rbind.data.frame, list(df1, df2))
df3$Month <- as.character(df3$Month)
df4 <- complete(df3, Month, variable)
df4$Month <- as.yearmon(df4$Month, "%b %Y")
df5 <- df4[order(df4$variable,df4$Month),]
df5
# Source: local data frame [72 x 8]
#
# Month variable Beta1 Beta2 Beta3 Beta4 Beta5 Beta6
# (yrmn) (fctr) (int) (int) (int) (int) (int) (int)
# 1 Jan 2005 A 1 2 3 4 5 6
# 2 Feb 2005 A 2 3 4 5 6 7
# 3 Mar 2005 A 3 4 5 6 7 8
# 4 Apr 2005 A 4 5 6 7 8 9
# 5 May 2005 A 5 6 7 8 9 10
# 6 Jun 2005 A 6 7 8 9 10 11
# 7 Jul 2005 A 7 8 9 10 11 12
# 8 Aug 2005 A 8 9 10 11 12 13
# 9 Sep 2005 A 9 10 11 12 13 14
# 10 Oct 2005 A 10 11 12 13 14 15
# .. ... ... ... ... ... ... ... ...
An alternative implementation with dplyr & tidyr:
library(dplyr)
library(tidyr)
df3 <- bind_rows(df1, df2) %>%
complete(Month, variable)
Two alternative possibilities of which especially the data.table altenative(s) are of interest when speed and memory are an issue:
base R :
Bind the dataframes together into one:
df3 <- rbind(df1,df2)
Create a reference dataframe with all possible combinations of Month
and variable
with expand.grid
:
ref <- expand.grid(Month = unique(df3$Month), variable = unique(df3$variable))
Merge them together with all.x=TRUE
so you make sure the missing combinations are filled with NA-values:
merge(ref, df3, by = c("Month", "variable"), all.x = TRUE)
Or (thanx to @PierreLafortune):
merge(ref, df3, by=1:2, all.x = TRUE)
data.table :
Bind the dataframes into one with 'rbindlist' which returns a 'data.table':
library(data.table)
DT <- rbindlist(list(df1,df2))
Join with a reference to ensure all combinations are present and missing ones are filled with NA:
DT[CJ(Month, variable, unique = TRUE), on = c(Month="V1", variable="V2")]
Everything together in one call:
DT <- rbindlist(list(df1,df2))[CJ(Month, variable, unique = TRUE), on = c(Month="V1", variable="V2")]
An alternative is wrapping rbindlist
in setkey
and then expanding with CJ
(cross join):
DT <- setkey(rbindlist(list(df1,df2)), Month, variable)[CJ(Month, variable, unique = TRUE)]