How to build a real-time dataframe in R?

Let's say I have two dataframes like the ones below:

df1 = structure(list(Date = c("2000-01-05", "2000-02-03", "2000-03-02", 
"2000-03-30", "2000-04-13", "2000-05-11", "2000-06-08", "2000-07-06", 
"2000-09-14", "2000-10-19", "2000-11-02", "2000-12-14", "2001-02-01", 
"2001-03-01", "2001-04-11", "2001-05-10", "2001-06-07", "2001-06-21", 
"2001-07-05", "2001-08-30", "2001-10-11", "2001-11-08", "2001-12-06"
)), row.names = c(NA, 23L), class = "data.frame")

         Date
1  2000-01-05
2  2000-02-03
3  2000-03-02
4  2000-03-30
5  2000-04-13
6  2000-05-11
7  2000-06-08
8  2000-07-06
9  2000-09-14
10 2000-10-19
11 2000-11-02
12 2000-12-14
13 2001-02-01
14 2001-03-01
15 2001-04-11
16 2001-05-10
17 2001-06-07
18 2001-06-21
19 2001-07-05
20 2001-08-30
21 2001-10-11
22 2001-11-08
23 2001-12-06


df2 = structure(list(Date = structure(c(10987, 11016, 11047, 11077, 
11108, 11138, 11169, 11200, 11230, 11261, 11291, 11322, 11353, 
11381, 11412, 11442, 11473, 11503, 11534, 11565, 11595, 11626, 
11656, 11687), class = "Date"), x = c(3.04285714285714, 3.27571428571429, 
3.5104347826087, 3.685, 3.92, 4.29454545454545, 4.30857142857143, 
4.41913043478261, 4.59047619047619, 4.76272727272727, 4.82909090909091, 
4.82684210526316, 4.75590909090909, 4.9925, 4.78136363636364, 
5.06421052631579, 4.65363636363636, 4.53952380952381, 4.50545454545454, 
4.49130434782609, 3.9865, 3.97130434782609, 3.50727272727273, 
3.33888888888889)), row.names = c(NA, 24L), class = "data.frame")

         Date        x
1  2000-01-31 3.042857
2  2000-02-29 3.275714
3  2000-03-31 3.510435
4  2000-04-30 3.685000
5  2000-05-31 3.920000
6  2000-06-30 4.294545
7  2000-07-31 4.308571
8  2000-08-31 4.419130
9  2000-09-30 4.590476
10 2000-10-31 4.762727
11 2000-11-30 4.829091
12 2000-12-31 4.826842
13 2001-01-31 4.755909
14 2001-02-28 4.992500
15 2001-03-31 4.781364
16 2001-04-30 5.064211
17 2001-05-31 4.653636
18 2001-06-30 4.539524
19 2001-07-31 4.505455
20 2001-08-31 4.491304
21 2001-09-30 3.986500
22 2001-10-31 3.971304
23 2001-11-30 3.507273
24 2001-12-31 3.338889

Now, what I would like to do is to create a real-time dataframe, that is, the data in df2 that were only available at the time of df1. For instance, at 2000-01-05 (first row in df1) no data in df2 was available since since 2000-01-31 (first row of df2) occurs after 2000-01-05. However, in 2000-02-03(second row in df1) the observation in 2000-01-31 (first row of df2) is available. This should be the reasoning for every row. The outcome should look like this:

         Date    y
1  2000-01-05   NA 
2  2000-02-03  3.042857
3  2000-03-02  3.275714
4  2000-03-30  3.275714
5  2000-04-13  3.510435
6  2000-05-11  3.685000
....

The rule would be: pick up from df2 only the observation that was available at the time of df1.

Can anyone help me?

Thanks!


What you can do is complete the df2 dates and then join.

library(dplyr)
library(tidyr)

# create a dataframe with all the days, not just the snapshots
df2_complete <- df2 %>% 
  complete(Date = seq.Date(min(Date), max(Date), by = "day")) %>% 
  fill(x, .direction = "down")

# convert to Date class for this case and join
df1 %>% 
  mutate(Date = as.Date(Date)) %>% 
  left_join(df2_complete, by = "Date")

Which gives:

         Date        x
1  2000-01-05       NA
2  2000-02-03 3.042857
3  2000-03-02 3.275714
4  2000-03-30 3.275714
5  2000-04-13 3.510435
6  2000-05-11 3.685000
....