Grouping data by time intervals and adding missing rows

I have a dataframe with all the bills sponsored by MPs during a legislative term. In order to create a count variable over the term, I grouped the observations by ID and attached this variable to another dataframe containing all the MPs with a left_join. Then I mutated all the NAs of the count variable to 0.

Now suppose that I want to do the same, but for multiple shorter periods of time (months or quarters). In my final dataframe, I would like to have a row for every MP-month, with the count variable in that month. Is there an easy way to implement that? Thank you.

This is a minimal example of what I did for the whole term

bills <- structure(list(number = c(1, 2, 3, 4, 5, 6, 7, 8, 9), date = structure(c(18632, 
                                                                                  18753, 18785, 18816, 18879, 18880, 18911, 18943, 18974), class = "Date"), 
                        id = c(1, 1, 3, 3, 3, 3, 5, 5, 5)), class = "data.frame", row.names = c(NA, 
                                                                                                -9L))

mp <- structure(list(id = c(1, 2, 3, 4, 5), name = c("Diane", "Nigel", 
                                                     "Peter", "Lucy", "Fleur"), surname = c("Abbott", "Adams", "Aldous", 
                                                                                            "Allan", "Anderson")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                       -5L))


# Count number of tot questions
bills_grouped <- bills %>% 
  group_by(id) %>% 
  mutate(count_btot = n())

bills_grouped <- bills_grouped %>% 
  select(c(id, count_btot))
bills_grouped <- bills_grouped %>% distinct() 

# Merge with all MPs
mp_count <- left_join(mp, bills_grouped)

# Replace NAs with 0
mp_count$count_btot[is.na(mp_count$count_btot)] <- 0

My desired output would be something like this

   id  name  surname      month count_month
1   1 Diane   Abbott 2021-01-01           1
2   2 Nigel    Adams 2021-01-01           0
3   3 Peter   Aldous 2021-01-01           0
4   4  Lucy    Allan 2021-01-01           0
5   5 Fleur Anderson 2021-01-01           0
6   1 Diane   Abbott 2021-02-01           0
7   2 Nigel    Adams 2021-02-01           0
8   3 Peter   Aldous 2021-02-01           0
9   4  Lucy    Allan 2021-02-01           0
10  5 Fleur Anderson 2021-02-01           0

Solution 1:

With the tidyverse:

left_join(mp, bills) %>% 
  group_by(name, surname, month = lubridate::floor_date(date, "month")) %>% 
  summarise(n = sum(!is.na(month))) %>% 
  replace_na(list(month = as.Date("2021-01-01"))) %>% 
  ungroup(month) %>% 
  complete(month = seq.Date(as.Date("2021-01-01"), as.Date("2021-12-01"), '1 month'), fill = list(n = 0))

# A tibble: 60 x 4
# Groups:   name, surname [5]
   name  surname month          n
   <chr> <chr>   <date>     <dbl>
 1 Diane Abbott  2021-01-01     1
 2 Diane Abbott  2021-02-01     0
 3 Diane Abbott  2021-03-01     0
 4 Diane Abbott  2021-04-01     0
 5 Diane Abbott  2021-05-01     1
 6 Diane Abbott  2021-06-01     0
 7 Diane Abbott  2021-07-01     0
 8 Diane Abbott  2021-08-01     0
 9 Diane Abbott  2021-09-01     0
10 Diane Abbott  2021-10-01     0

Grouping data by time intervals and adding missing rows

Solution 1:

Related

Recent Posts