Expand rows by date range using start and end date
Using data.table
:
require(data.table) ## 1.9.2+
setDT(df)[ , list(idnum = idnum, month = seq(start, end, by = "month")), by = 1:nrow(df)]
# you may use dot notation as a shorthand alias of list in j:
setDT(df)[ , .(idnum = idnum, month = seq(start, end, by = "month")), by = 1:nrow(df)]
setDT
converts df
to a data.table
. Then for each row, by = 1:nrow(df)
, we create idnum
and month
as required.
Using dplyr
:
test %>%
group_by(idnum) %>%
summarize(start=min(start),end=max(end)) %>%
do(data.frame(idnum=.$idnum, month=seq(.$start,.$end,by="1 month")))
Note that here I don't generate a sequence between start
and end
for each row, instead it is a sequence between min(start)
and max(end)
for each idnum
. If you want the former :
test %>%
rowwise() %>%
do(data.frame(idnum=.$idnum, month=seq(.$start,.$end,by="1 month")))
Updated2
With new versions of purrr
(0.3.0
) and dplyr
(0.8.0
), this can be done with map2
library(dplyr)
library(purrr)
test %>%
# sequence of monthly dates for each corresponding start, end elements
transmute(idnum, month = map2(start, end, seq, by = "1 month")) %>%
# unnest the list column
unnest %>%
# remove any duplicate rows
distinct
Updated
Based on @Ananda Mahto's comments
res1 <- melt(setNames(lapply(1:nrow(test), function(x) seq(test[x, "start"],
test[x, "end"], by = "1 month")), test$idnum))
Also,
res2 <- setNames(do.call(`rbind`,
with(test,
Map(`expand.grid`,idnum,
Map(`seq`, start, end, by='1 month')))), c("idnum", "month"))
head(res1)
# idnum month
#1 17 1993-01-01
#2 17 1993-02-01
#3 17 1993-03-01
#4 17 1993-04-01
#5 17 1993-05-01
#6 17 1993-06-01
tidyverse
answer
Data
df <- structure(list(idnum = c(17L, 17L, 17L), start = structure(c(8401,
8401, 8401), class = "Date"), end = structure(c(8765, 8765, 8765
), class = "Date")), class = "data.frame", .Names = c("idnum",
"start", "end"), row.names = c(NA, -3L))
Answer and output
library(tidyverse)
df %>%
nest(start, end) %>%
mutate(data = map(data, ~seq(unique(.x$start), unique(.x$end), 1))) %>%
unnest(data)
# # A tibble: 365 x 2
# idnum data
# <int> <date>
# 1 17 1993-01-01
# 2 17 1993-01-02
# 3 17 1993-01-03
# 4 17 1993-01-04
# 5 17 1993-01-05
# 6 17 1993-01-06
# 7 17 1993-01-07
# 8 17 1993-01-08
# 9 17 1993-01-09
# 10 17 1993-01-10
# # ... with 355 more rows