Mark dates within several date ranges
This can be done as a range-based or non-equi join. Unfortunately, dplyr
alone cannot do it, but one of the following should work fine.
The code below assigns the particular events$name
to each row, not just an "in range" indicator. It's not hard to simplify that with in_range = !is.na(name)
or similar.
fuzzyjoin
# library(fuzzyjoin)
date_info %>%
fuzzyjoin::fuzzy_left_join(events,
by = c(date = "start_date", date = "end_date"),
match_fun = list(`>=`, `<=`))
# # A tibble: 4 x 6
# date statistic value name start_date end_date
# <date> <chr> <dbl> <chr> <date> <date>
# 1 2021-10-16 var1 10 NA NA NA
# 2 2021-10-16 var2 54 NA NA NA
# 3 2021-10-17 var1 23 Event A 2021-10-17 2021-10-19
# 4 2021-10-17 var2 34 Event A 2021-10-17 2021-10-19
sqldf
# library(sqldf)
sqldf::sqldf("
select t1.*, t2.name
from date_info t1
left join events t2 on t1.date between t2.start_date and t2.end_date")
# date statistic value name
# 1 2021-10-16 var1 10 <NA>
# 2 2021-10-16 var2 54 <NA>
# 3 2021-10-17 var1 23 Event A
# 4 2021-10-17 var2 34 Event A
data.table
library(data.table)
date_info_DT <- as.data.table(date_info)
events_DT <- as.data.table(events)
date_info_DT[events_DT, name := i.name,
on = .(date >= start_date, date <= end_date)][]
# date statistic value name
# <Date> <char> <num> <char>
# 1: 2021-10-16 var1 10 <NA>
# 2: 2021-10-16 var2 54 <NA>
# 3: 2021-10-17 var1 23 Event A
# 4: 2021-10-17 var2 34 Event A
(There's also data.table::foverlaps
, which requires the second data.table
to be key
ed.)
Another option, a bit simpler (not requiring class-changes):
date_info %>%
mutate(in_range = data.table::inrange(date, events$start_date, events$end_date))
# # A tibble: 4 x 4
# date statistic value in_range
# <date> <chr> <dbl> <lgl>
# 1 2021-10-16 var1 10 FALSE
# 2 2021-10-16 var2 54 FALSE
# 3 2021-10-17 var1 23 TRUE
# 4 2021-10-17 var2 34 TRUE
Here's a solution using map
from the purrr
package that should work. It could be more concise but I made it very explicit so it's not overwhelming if you're not familiar with the syntax.
date_info |>
mutate(
in_range_n = map_dbl(date, .f = function(date){
filter(events, start_date <= date, end_date >= date) |>
nrow()
}),
in_range = in_range_n > 0
) |>
select(-in_range_n)
Output:
# A tibble: 4 x 4
date statistic value in_range
<date> <chr> <dbl> <lgl>
1 2021-10-16 var1 10 FALSE
2 2021-10-16 var2 54 FALSE
3 2021-10-17 var1 23 TRUE
4 2021-10-17 var2 34 TRUE
Let me know if I misunderstood the problem!