Count number of duplicates value within groups and subgroups in R [duplicate]
I have a dataframe:
COL1 COL2 COL3
A Canis lup
A Canis lupu
A Canis lupus
B Canis lupus
C Canis rattus
C Canis lupys
A Lottus po
B Lottus pi
B Lottus pe
and I would like to count for each COL1,COL2
, the number of duplicate COL1
values and get:
COL1 COL2 COL3 Ndup
A Canis lup 3
A Canis lupu 3
A Canis lupus 3
B Canis lupus 1
C Canis rattus 2
C Canis lupys 2
A Lottus po 1
B Lottus pi 2
B Lottus pe 2
Here is the dput format if it can helps:
structure(list(COL1 = c("A", "A", "A", "B", "C", "C", "A", "B",
"B"), COL2 = c("Canis", "Canis", "Canis", "Canis", "Canis", "Canis",
"Lottus", "Lottus", "Lottus"), COL3 = c("lup", "lupu", "lupus",
"lupus", "rattus", "lupys", "po", "pi", "pe")), row.names = c(NA,
-9L), class = "data.frame")
Note : COL3
is just here to illustrate that I have other columns in the dataframe.
We can just easily use dplyr::add_count
.
library(dplyr)
df %>%
add_count(COL1, COL2, name = "Ndup")
#> # A tibble: 9 × 4
#> # Groups: COL1, COL2 [5]
#> COL1 COL2 COL3 Ndup
#> <chr> <chr> <chr> <int>
#> 1 A Canis lup 3
#> 2 A Canis lupu 3
#> 3 A Canis lupus 3
#> 4 B Canis lupus 1
#> 5 C Canis rattus 2
#> 6 C Canis lupys 2
#> 7 A Lottus po 1
#> 8 B Lottus pi 2
#> 9 B Lottus pe 2