How to calculate share rate for each group in Dplyr?
I have a data set that shows the number of products for each group and shop.
df <- tribble(
~shop_id, ~group, ~group_2, ~products,
'1', 'A', 'Z', 10,
'2', 'B', 'Y', 20,
'3', 'A', 'X', 30,
'4', 'B', 'X', 40,
'5', 'A', 'R', 10
)
I now want to see the share of products for each shop id and group. But I want to exclude the group 2 column in the data. For instance, there are 50 products in group A, so the share for shop 1 should be 0.2. Here is the desired output:
df <- tribble(
~shop_id, ~group, ~products, ~share_products,
'1', 'A', 10, 0.2,
'2', 'B', 20, 0.33,
'3', 'A', 30, 0.6,
'4', 'B', 40, 0.66,
'5', 'A', 10, 0.2
)
How can I do this?
After grouping, divide by the sum
of 'products'
library(dplyr)
df1 <- df %>%
select(-group_2) %>%
group_by(group) %>%
mutate(share_products = products/sum(products)) %>%
ungroup
-output
df1
# A tibble: 5 × 4
shop_id group products share_products
<chr> <chr> <dbl> <dbl>
1 1 A 10 0.2
2 2 B 20 0.333
3 3 A 30 0.6
4 4 B 40 0.667
5 5 A 10 0.2
If there are several 'group' columns and want to loop over those
library(purrr)
grp_cols <- grep("group", names(df), value = TRUE)
map(grp_cols, ~ df %>%
select(shop_id, all_of(.x), products) %>%
group_by(across(all_of(.x))) %>%
mutate(share_products = products/sum(products)) %>%
ungroup)
[[1]]
# A tibble: 5 × 4
shop_id group products share_products
<chr> <chr> <dbl> <dbl>
1 1 A 10 0.2
2 2 B 20 0.333
3 3 A 30 0.6
4 4 B 40 0.667
5 5 A 10 0.2
[[2]]
# A tibble: 5 × 4
shop_id group_2 products share_products
<chr> <chr> <dbl> <dbl>
1 1 Z 10 1
2 2 Y 20 1
3 3 X 30 0.429
4 4 X 40 0.571
5 5 R 10 1
We could use prop.table
after grouping
df %>%
select(-group_2) %>%
group_by(group) %>%
mutate(prop = prop.table(products))
shop_id group products prop
<chr> <chr> <dbl> <dbl>
1 1 A 10 0.2
2 2 B 20 0.333
3 3 A 30 0.6
4 4 B 40 0.667
5 5 A 10 0.2