Creating a conditional dummy variable column in R

I'm working with a cross-country panel dataset, one of my variables (cc_dummy) takes the value of 1 & 0 (there are also missing values indicated by NAs). I want to create a new column such that if cc_dummy takes the value 1 for three consecutive years for each country, I want only the first year of each three-year window to take the value 1 whereas the other years take the value 0. A snapshot of my data is given below. Any help would be appreciated. If possible, I would like to create the column using dplyr.

structure(list(country = c("Argentina", "Argentina", "Argentina", 
"Argentina", "Argentina", "Argentina", "Argentina", "Argentina", 
"Argentina", "Argentina", "Argentina", "Argentina", "Argentina", 
"Argentina", "Argentina", "Argentina", "Argentina", "Argentina", 
"Argentina", "Argentina", "Argentina", "Argentina", "Argentina", 
"Argentina", "Argentina", "Argentina", "Argentina", "Argentina", 
"Argentina", "Argentina", "Argentina", "Argentina", "Argentina", 
"Argentina", "Argentina", "Argentina", "Argentina", "Argentina", 
"Argentina", "Argentina", "Argentina", "Argentina", "Argentina", 
"Argentina", "Argentina", "Argentina", "Brazil", "Brazil", "Brazil", 
"Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", 
"Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", 
"Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", 
"Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", 
"Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", 
"Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", 
"Brazil", "Brazil"), year = c(1975, 1976, 1977, 1978, 1979, 1980, 
1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 
1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 
2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 
2014, 2015, 2016, 2017, 2018, 2019, 2020, 1975, 1976, 1977, 1978, 
1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 
1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 
2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 
2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021), 
    cc_dummy = c(NA, NA, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 
    1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, NA, NA, 0, 0, 1, 0, 
    1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 
    0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 
    0, 0, 0)), row.names = c(NA, -93L), class = c("tbl_df", "tbl", 
"data.frame"))

Solution 1:

Here's a way to do it using data.table::rleid and dplyr grammar:

library(dplyr)
df %>% 
  group_by(country, m = data.table::rleid(cc_dummy)) %>% 
  mutate(newcol = ifelse(n() >= 3 & cumsum(cc_dummy) == 1, 1, 0))

# A tibble: 93 x 5
# Groups:   country, m [32]
   country    year cc_dummy     m newcol
   <chr>     <dbl>    <dbl> <int>  <dbl>
 1 Argentina  1975       NA     1      0
 2 Argentina  1976       NA     1      0
 3 Argentina  1977        0     2      0
 4 Argentina  1978        0     2      0
 5 Argentina  1979        0     2      0
 6 Argentina  1980        0     2      0
 7 Argentina  1981        1     3      0
 8 Argentina  1982        1     3      0
 9 Argentina  1983        0     4      0
10 Argentina  1984        1     5      0
11 Argentina  1985        0     6      0
12 Argentina  1986        0     6      0
13 Argentina  1987        1     7      1
14 Argentina  1988        1     7      0
15 Argentina  1989        1     7      0
16 Argentina  1990        0     8      0
17 Argentina  1991        0     8      0
18 Argentina  1992        0     8      0
19 Argentina  1993        0     8      0
20 Argentina  1994        0     8      0
21 Argentina  1995        0     8      0
22 Argentina  1996        0     8      0
23 Argentina  1997        0     8      0
24 Argentina  1998        0     8      0
25 Argentina  1999        0     8      0
26 Argentina  2000        0     8      0
27 Argentina  2001        0     8      0
28 Argentina  2002        1     9      0