how to create a frequency variable of occurence of a value per name per year in R (dataframe)

I am struggling to create a new variable/colum in my dataframe, which contains the frequency of wi = 1 per name, per year.

Here is a test dataframe df = tibble::as_tibble(data.frame(Name=c("x","x","x","x","x", "y","y","y","y","y","y"), Year=c(2011,2011,2011,2012,2012,2011,2011,2012,2012,2012,2012), id=c(8,23, 1,5,7,25,83,6,2,9,10), wi =c(1,0,1,1,0,1,1,0,0,1,0)))

# A tibble: 11 × 4
   Name   Year    id    wi
   <chr> <dbl> <dbl> <dbl>
 1 x      2011     8     1
 2 x      2011    23     0
 3 x      2011     1     1
 4 x      2012     5     1
 5 x      2012     7     0
 6 y      2011    25     1
 7 y      2011    83     1
 8 y      2012     6     0
 9 y      2012     2     0
10 y      2012     9     1
11 y      2012    10     0

Ideally the dataframe would end up looking like this:


df
# A tibble: 11 × 5
   Name   Year    id    wi freq_wi
   <chr> <dbl> <dbl> <dbl>   <dbl>
 1 x      2011     8     1    0.66
 2 x      2011    23     0    0.66
 3 x      2011     1     1    0.66
 4 x      2012     5     1    0.5 
 5 x      2012     7     0    0.5 
 6 y      2011    25     1    1   
 7 y      2011    83     1    1   
 8 y      2012     6     0    0.25
 9 y      2012     2     0    0.25
10 y      2012     9     1    0.25
11 y      2012    10     0    0.25

All help is appreciated!!


Solution 1:

Here is another dplyr solution:

library(dplyr)
df %>% 
  group_by(Name, Year) %>% 
  mutate(Count = ifelse(wi == 1, sum(wi), sum(wi)),
         req_wi = Count/sum(Count)*Count) %>% 
  ungroup() %>% 
  select(-Count)
    Name   Year    id    wi req_wi
   <chr> <dbl> <dbl> <dbl>  <dbl>
 1 x      2011     8     1  0.667
 2 x      2011    23     0  0.667
 3 x      2011     1     1  0.667
 4 x      2012     5     1  0.5  
 5 x      2012     7     0  0.5  
 6 y      2011    25     1  1    
 7 y      2011    83     1  1    
 8 y      2012     6     0  0.25 
 9 y      2012     2     0  0.25 
10 y      2012     9     1  0.25 
11 y      2012    10     0  0.25 

Solution 2:

If wi is always 0 or 1 you can do the following (because mean(wi) is then equal to "frequency of wi")

library(dplyr)
    df %>% 
      group_by(Name, Year) %>% 
      summarise(freq_wi=mean(wi)) %>% 
      left_join(df, .)

       Name   Year    id    wi freq_wi
       <fct> <dbl> <dbl> <dbl>   <dbl>
     1 x      2011     8     1   0.667
     2 x      2011    23     0   0.667
     3 x      2011     1     1   0.667
     4 x      2012     5     1   0.5  
     5 x      2012     7     0   0.5  
     6 y      2011    25     1   1    
     7 y      2011    83     1   1    
     8 y      2012     6     0   0.25 
     9 y      2012     2     0   0.25 
    10 y      2012     9     1   0.25 
    11 y      2012    10     0   0.25