Calculating the difference between two dates by group with caveat

Data looks like this:

df <- data.frame(
  id = c(283,994,294,294,1001,1001), 
  stint = c(1,1,1,2,1,2), 
  admit = c("2010-2-3","2011-2-4","2011-3-4","2012-4-1","2016-1-2","2017-2-3"),
  release = c("2011-2-3","2011-2-28","2011-4-1","2014-6-6","2017-2-1","2018-3-1")
)

okay so bear with me because I'm finding this kind of hard to articulate. I need to calculate the difference between the release date of the first stint and the admit date of the second stint by id. so that the difference, which I'm calling the "exposure" should look like this for the sample above

exposure=c(NA,NA,365,NA,2,NA)

So an NA will be returned if there is only 1 stint and if there are more than one stint the exposure period will be calculated using the previous release date and the current admit date. So exposure for stint three will be admit of stint 3 - the release of stint 2.


Solution 1:

You want to calculate the exposure if stint == 2, otherwise return NA. That can be accomplished with ifelse. However, you want the release to be from the previous release date. That can be done with lag. But that will tie exposure values to the admit where exposure ==2, whereas you want exposure to be associated to the previous release used in the calculation. So, remove the first exposure value and add an NA at the end.

  df %>% 
    mutate(across(c(admit, release), as.Date), 
           exposure = c(ifelse(stint == 2, admit - lag(release), NA)[-1], NA))

Which yields

    id stint      admit    release exposure
1  283     1 2010-02-03 2011-02-03       NA
2  994     1 2011-02-04 2011-02-28       NA
3  294     1 2011-03-04 2011-04-01      366
4  294     2 2012-04-01 2014-06-06       NA
5 1001     1 2016-01-02 2017-02-01        2
6 1001     2 2017-02-03 2018-03-01       NA