How to delete groups containing less than 3 rows of data in R? [duplicate]
I'm using the dplyr package in R and have grouped my data by 3 variables (Year, Site, Brood).
I want to get rid of groups made up of less than 3 rows. For example in the following sample I would like to remove the rows for brood '2'. I have a lot of data to do this with so while I could painstakingly do it by hand it would be so helpful to automate it using R.
Year Site Brood Parents
1996 A 1 1
1996 A 1 1
1996 A 1 0
1996 A 1 0
1996 A 2 1
1996 A 2 0
1996 A 3 1
1996 A 3 1
1996 A 3 1
1996 A 3 0
1996 A 3 1
I hope this makes sense and thank you very much in advance for your help! I'm new to R and stackoverflow so apologies if the way I've worded this question isn't very good! Let me know if I need to provide any other information.
Solution 1:
One way to do it is to use the magic n()
function within filter
:
library(dplyr)
my_data <- data.frame(Year=1996, Site="A", Brood=c(1,1,2,2,2))
my_data %>%
group_by(Year, Site, Brood) %>%
filter(n() >= 3)
The n()
function gives the number of rows in the current group (or the number of rows total if there is no grouping).
Solution 2:
Throwing the data.table
approach here to join the party:
library(data.table)
setDT(my_data)
my_data[ , if (.N >= 3L) .SD, by = .(Year, Site, Brood)]
Solution 3:
You can also do this using base R:
temp <- read.csv(paste(folder,"test.csv", sep=""), head=TRUE, sep=",")
matches <- aggregate(Parents ~ Year + Site + Brood, temp, FUN="length")
temp <- merge(temp, matches, by=c("Year","Site","Brood"))
temp <- temp[temp$Parents.y >= 3, c(1,2,3,4)]