Repeating variable in group by category in R
Solution 1:
One possible approach with tidyverse
, if I understand the logic correctly. After grouping by both id
and company
, you can see if both categories "a" and "b" are present; if so, mark those rows where category is "a" with "rp".
A more convoluted case_when
can consider your different rules, but leave as missing NA
situations where you need "p" with a sequence of numbers. A temporary column including a counter can be made based on these missing values to give you "p1", "p2", etc.
library(tidyverse)
dfx %>%
group_by(id, company) %>%
mutate(new_flag = case_when(
all(c("a", "b") %in% category) & category == "a" ~ "rp",
category == "a" ~ "nr",
TRUE ~ NA_character_)) %>%
group_by(id) %>%
mutate(new_flag = case_when(
category == "b" & new_flag[category == "a"][1] == "nr" ~ "p0",
category == "b" & new_flag[category == "a"][1] == "rp" &
company == company[category == "a"][1] ~ NA_character_,
category == "b" & new_flag[category == "a"][1] == "rp" &
company != company[category == "a"][1] ~ "p0",
TRUE ~ new_flag)) %>%
group_by(id, company) %>%
mutate(ctr = cumsum(is.na(new_flag) & date != lag(date, default = first(date[is.na(new_flag)])))) %>%
mutate(new_flag = ifelse(is.na(new_flag), paste0("p", ctr), new_flag)) %>%
select(-ctr)
Output
id date category company flag new_flag
<dbl> <date> <chr> <chr> <chr> <chr>
1 1 2001-01-04 a x rp rp
2 1 2007-09-23 b x p1 p1
3 1 2008-11-14 b x p2 p2
4 2 2009-11-13 a x nr nr
5 2 2012-07-21 b y p0 p0
6 2 2014-09-15 b y p0 p0
7 3 2000-04-01 a x rp rp
8 3 2008-07-14 b x p1 p1
9 3 2008-07-14 b x p1 p1
10 4 2001-03-21 a x nr nr
11 4 2019-05-23 b y p0 p0
12 4 2019-05-08 b z p0 p0
13 5 2004-07-06 a x rp rp
14 5 2007-08-12 a x rp rp
15 5 2011-09-20 b x p1 p1
16 5 2011-09-20 b x p1 p1
17 5 2014-08-15 b x p2 p2
18 5 2014-08-15 b y p0 p0