Write conditional statement to identify dates within 1 year and greater than 30 days apart
Trying to build conditional statements to identify cases over a huge dataset. I need to select patients who have had >1 code separated by 30 days (ie one date THEN another date >30 days later) AND who have had more than one code within 1 year.
I haven't really figured out how to write a conditional for this, though. Here's an example dataset:
dates = c("2021-01-05", "2021-01-23", "2021-04-05", "2022-01-05", "2019-01-08", "2019-01-14")
patient = c("A", "A", "A", "A", "B", "B")
df <- data.frame(dates, ids)
Patient A would qualify because they have had dates separated by >30 days and have had 2 dates within 1 year
Patient B would not because they have had 2 dates within 30 days only
Is there some easy way to calculate this in R? Seems like a basic question, but I'm not sure how to compare each instance within a vector to each other instance in an efficient way (actual dataset has >100,000 patients and >60 million dates)
df <- data.frame(dates = as.Date(dates), patient)
This looks at each patient, calculates the time since last case, filters for only the cases that followed the prior one by at least 30 but fewer than 365 days, and then shows each patient id once.
library(dplyr)
df %>%
arrange(dates) %>%
group_by(patient) %>%
mutate(gap = dates - lag(dates)) %>%
filter(gap > 30, gap < 365) %>%
ungroup() %>%
distinct(patient)