Extract elements common in all column groups

I have a R dataset x as below:

  ID Month
1   1   Jan
2   3   Jan
3   4   Jan
4   6   Jan
5   6   Jan
6   9   Jan
7   2   Feb
8   4   Feb
9   6   Feb
10  8   Feb
11  9   Feb
12 10   Feb
13  1   Mar
14  3   Mar
15  4   Mar
16  6   Mar
17  7   Mar
18  9   Mar
19  2   Apr
20  4   Apr
21  6   Apr
22  7   Apr
23  8   Apr
24 10   Apr
25  1   May
26  2   May
27  4   May
28  6   May
29  7   May
30  8   May
31  2   Jun
32  4   Jun
33  5   Jun
34  6   Jun
35  9   Jun
36 10   Jun

I am trying to figure out a R function/code to identify all IDs that exist atleast once in every month. In the above case, ID 4 & 6 are present in all months.

Thanks


First, split the df$ID by Month and use intersect to find elements common in each sub-group.

Reduce(intersect, split(df$ID, df$Month))
#[1] 4 6

If you want to subset the corresponding data.frame, do

df[df$ID %in% Reduce(intersect, split(df$ID, df$Month)),]

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'ID', get the row index (.I) where the number of unique 'Months' are equal to the number of unique 'Months' in the whole dataset and subset the data based on this

library(data.table)
setDT(df1)[df1[, .I[uniqueN(Month) == uniqueN(df1$Month)], ID]$V1]
#    ID Month
# 1:  4   Jan
# 2:  4   Feb
# 3:  4   Mar
# 4:  4   Apr
# 5:  4   May
# 6:  4   Jun
# 7:  6   Jan
# 8:  6   Jan
# 9:  6   Feb
#10:  6   Mar
#11:  6   Apr
#12:  6   May
#13:  6   Jun

To extract the 'ID's

setDT(df1)[, ID[uniqueN(Month) == uniqueN(df1$Month)], ID]$V1
#[1] 4 6

Or with base R

1) Using table with rowSums

v1 <- rowSums(table(df1) > 0)
names(v1)[v1==max(v1)]
#[1] "4" "6"

This info can be used for subsetting the data

subset(df1, ID %in% names(v1)[v1 == max(v1)])

2) Using tapply

lst <- with(df1, tapply(Month, ID, FUN = unique))
names(which(lengths(lst) == length(unique(df1$Month))))
#[1] "4" "6"

Or using dplyr

library(dplyr)
df1 %>%
     group_by(ID) %>%
     filter(n_distinct(Month)== n_distinct(df1$Month)) %>%
     .$ID %>%
     unique
#[1] 4 6

or if we need to get the rows

df1 %>%
     group_by(ID) %>%
     filter(n_distinct(Month)== n_distinct(df1$Month))
# A tibble: 13 x 2
# Groups:   ID [2]
#      ID Month
#   <int> <chr>
# 1     4   Jan
# 2     6   Jan
# 3     6   Jan
# 4     4   Feb
# 5     6   Feb
# 6     4   Mar
# 7     6   Mar
# 8     4   Apr
# 9     6   Apr
#10     4   May
#11     6   May
#12     4   Jun
#13     6   Jun

An alternative solution using dplyr and purrr:

tib %>%
  dplyr::group_by(Month) %>%
  dplyr::group_split(.keep = F) %>%
  purrr::reduce(intersect)

# A tibble: 2 x 1
#      ID
#   <dbl>
# 1     4
# 2     6

returns the desired IDs, where tib is a tibble containing the input data.