How do I duplicate and add rows between the values of two different columns?

I think tidyr::expand() and full_seq() can achieve what you want, with grouping on stateabb and styear since you have multiple start years for some states.

Assuming your data frame is named mydata, something like this. I have retained the column of expanded years and named it filled_year, but you may want to remove it.

library(dplyr)
library(tidyr)

new_data <- mydata %>% 
  group_by(stateabb, styear) %>% 
  tidyr::expand(stateabb, full_seq(c(styear, endyear), 1)) %>% 
  inner_join(mydata) %>% 
  rename(filled_year = `full_seq(c(styear, endyear), 1)`) %>%
  ungroup()

The top and bottom of the USA rows:

new_data %>% 
  filter(stateabb == "USA") %>% 
  head()

# A tibble: 6 x 10
  styear stateabb filled_year ccode stmonth stday endyear endmonth endday version
   <int> <chr>          <dbl> <int>   <int> <int>   <int>    <int>  <int>   <int>
1   1898 USA             1898     2       8    13    2016       12     31    2016
2   1898 USA             1899     2       8    13    2016       12     31    2016
3   1898 USA             1900     2       8    13    2016       12     31    2016
4   1898 USA             1901     2       8    13    2016       12     31    2016
5   1898 USA             1902     2       8    13    2016       12     31    2016
6   1898 USA             1903     2       8    13    2016       12     31    2016

new_data %>% 
  filter(stateabb == "USA") %>% 
  tail()

# A tibble: 6 x 10
  styear stateabb filled_year ccode stmonth stday endyear endmonth endday version
   <int> <chr>          <dbl> <int>   <int> <int>   <int>    <int>  <int>   <int>
1   1898 USA             2011     2       8    13    2016       12     31    2016
2   1898 USA             2012     2       8    13    2016       12     31    2016
3   1898 USA             2013     2       8    13    2016       12     31    2016
4   1898 USA             2014     2       8    13    2016       12     31    2016
5   1898 USA             2015     2       8    13    2016       12     31    2016
6   1898 USA             2016     2       8    13    2016       12     31    2016

Your example data:

mydata <- structure(list(stateabb = c("USA", "UKG", "FRN", "FRN", "GMY", 
"GMY", "GMY", "AUH", "ITA", "RUS", "RUS", "CHN", "JPN", "JPN"
), ccode = c(2L, 200L, 220L, 220L, 255L, 255L, 255L, 300L, 325L, 
365L, 365L, 710L, 740L, 740L), styear = c(1898L, 1816L, 1816L, 
1945L, 1816L, 1925L, 1991L, 1816L, 1860L, 1816L, 1922L, 1950L, 
1895L, 1991L), stmonth = c(8L, 1L, 1L, 8L, 1L, 1L, 12L, 1L, 1L, 
1L, 1L, 1L, 4L, 12L), stday = c(13L, 1L, 1L, 15L, 1L, 1L, 11L, 
1L, 1L, 1L, 1L, 1L, 1L, 11L), endyear = c(2016L, 2016L, 1940L, 
2016L, 1918L, 1945L, 2016L, 1918L, 1943L, 1917L, 2016L, 2016L, 
1945L, 2016L), endmonth = c(12L, 12L, 6L, 12L, 11L, 5L, 12L, 
11L, 9L, 12L, 12L, 12L, 8L, 12L), endday = c(31L, 31L, 22L, 31L, 
11L, 7L, 31L, 3L, 2L, 5L, 31L, 31L, 14L, 31L), version = c(2016L, 
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 
2016L, 2016L, 2016L, 2016L)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14"))

How do I duplicate and add rows between the values of two different columns?

Related

Recent Posts