how to create a frequency variable of occurence of a value per name per year in R (dataframe)
I am struggling to create a new variable/colum in my dataframe, which contains the frequency of wi = 1 per name, per year.
Here is a test dataframe
df = tibble::as_tibble(data.frame(Name=c("x","x","x","x","x", "y","y","y","y","y","y"), Year=c(2011,2011,2011,2012,2012,2011,2011,2012,2012,2012,2012), id=c(8,23, 1,5,7,25,83,6,2,9,10), wi =c(1,0,1,1,0,1,1,0,0,1,0)))
# A tibble: 11 × 4
Name Year id wi
<chr> <dbl> <dbl> <dbl>
1 x 2011 8 1
2 x 2011 23 0
3 x 2011 1 1
4 x 2012 5 1
5 x 2012 7 0
6 y 2011 25 1
7 y 2011 83 1
8 y 2012 6 0
9 y 2012 2 0
10 y 2012 9 1
11 y 2012 10 0
Ideally the dataframe would end up looking like this:
df
# A tibble: 11 × 5
Name Year id wi freq_wi
<chr> <dbl> <dbl> <dbl> <dbl>
1 x 2011 8 1 0.66
2 x 2011 23 0 0.66
3 x 2011 1 1 0.66
4 x 2012 5 1 0.5
5 x 2012 7 0 0.5
6 y 2011 25 1 1
7 y 2011 83 1 1
8 y 2012 6 0 0.25
9 y 2012 2 0 0.25
10 y 2012 9 1 0.25
11 y 2012 10 0 0.25
All help is appreciated!!
Solution 1:
Here is another dplyr
solution:
library(dplyr)
df %>%
group_by(Name, Year) %>%
mutate(Count = ifelse(wi == 1, sum(wi), sum(wi)),
req_wi = Count/sum(Count)*Count) %>%
ungroup() %>%
select(-Count)
Name Year id wi req_wi
<chr> <dbl> <dbl> <dbl> <dbl>
1 x 2011 8 1 0.667
2 x 2011 23 0 0.667
3 x 2011 1 1 0.667
4 x 2012 5 1 0.5
5 x 2012 7 0 0.5
6 y 2011 25 1 1
7 y 2011 83 1 1
8 y 2012 6 0 0.25
9 y 2012 2 0 0.25
10 y 2012 9 1 0.25
11 y 2012 10 0 0.25
Solution 2:
If wi is always 0 or 1 you can do the following (because mean(wi) is then equal to "frequency of wi")
library(dplyr)
df %>%
group_by(Name, Year) %>%
summarise(freq_wi=mean(wi)) %>%
left_join(df, .)
Name Year id wi freq_wi
<fct> <dbl> <dbl> <dbl> <dbl>
1 x 2011 8 1 0.667
2 x 2011 23 0 0.667
3 x 2011 1 1 0.667
4 x 2012 5 1 0.5
5 x 2012 7 0 0.5
6 y 2011 25 1 1
7 y 2011 83 1 1
8 y 2012 6 0 0.25
9 y 2012 2 0 0.25
10 y 2012 9 1 0.25
11 y 2012 10 0 0.25