Grouping data by time intervals and adding missing rows
I have a dataframe with all the bills sponsored by MPs during a legislative term. In order to create a count variable over the term, I grouped the observations by ID and attached this variable to another dataframe containing all the MPs with a left_join. Then I mutated all the NAs of the count variable to 0.
Now suppose that I want to do the same, but for multiple shorter periods of time (months or quarters). In my final dataframe, I would like to have a row for every MP-month, with the count variable in that month. Is there an easy way to implement that? Thank you.
This is a minimal example of what I did for the whole term
bills <- structure(list(number = c(1, 2, 3, 4, 5, 6, 7, 8, 9), date = structure(c(18632,
18753, 18785, 18816, 18879, 18880, 18911, 18943, 18974), class = "Date"),
id = c(1, 1, 3, 3, 3, 3, 5, 5, 5)), class = "data.frame", row.names = c(NA,
-9L))
mp <- structure(list(id = c(1, 2, 3, 4, 5), name = c("Diane", "Nigel",
"Peter", "Lucy", "Fleur"), surname = c("Abbott", "Adams", "Aldous",
"Allan", "Anderson")), class = "data.frame", row.names = c(NA,
-5L))
# Count number of tot questions
bills_grouped <- bills %>%
group_by(id) %>%
mutate(count_btot = n())
bills_grouped <- bills_grouped %>%
select(c(id, count_btot))
bills_grouped <- bills_grouped %>% distinct()
# Merge with all MPs
mp_count <- left_join(mp, bills_grouped)
# Replace NAs with 0
mp_count$count_btot[is.na(mp_count$count_btot)] <- 0
My desired output would be something like this
id name surname month count_month
1 1 Diane Abbott 2021-01-01 1
2 2 Nigel Adams 2021-01-01 0
3 3 Peter Aldous 2021-01-01 0
4 4 Lucy Allan 2021-01-01 0
5 5 Fleur Anderson 2021-01-01 0
6 1 Diane Abbott 2021-02-01 0
7 2 Nigel Adams 2021-02-01 0
8 3 Peter Aldous 2021-02-01 0
9 4 Lucy Allan 2021-02-01 0
10 5 Fleur Anderson 2021-02-01 0
Solution 1:
With the tidyverse
:
left_join(mp, bills) %>%
group_by(name, surname, month = lubridate::floor_date(date, "month")) %>%
summarise(n = sum(!is.na(month))) %>%
replace_na(list(month = as.Date("2021-01-01"))) %>%
ungroup(month) %>%
complete(month = seq.Date(as.Date("2021-01-01"), as.Date("2021-12-01"), '1 month'), fill = list(n = 0))
# A tibble: 60 x 4
# Groups: name, surname [5]
name surname month n
<chr> <chr> <date> <dbl>
1 Diane Abbott 2021-01-01 1
2 Diane Abbott 2021-02-01 0
3 Diane Abbott 2021-03-01 0
4 Diane Abbott 2021-04-01 0
5 Diane Abbott 2021-05-01 1
6 Diane Abbott 2021-06-01 0
7 Diane Abbott 2021-07-01 0
8 Diane Abbott 2021-08-01 0
9 Diane Abbott 2021-09-01 0
10 Diane Abbott 2021-10-01 0