Return df with a columns values that occur more than once [duplicate]
Solution 1:
Here is a dplyr
solution (using mrFlick's data.frame)
library(dplyr)
newd <- dd %>% group_by(b) %>% filter(n()>1) #
newd
# a b
# 1 1 1
# 2 2 1
# 3 5 4
# 4 6 4
# 5 7 4
# 6 9 6
# 7 10 6
Or, using data.table
setDT(dd)[,if(.N >1) .SD,by=b]
Or using base R
dd[dd$b %in% unique(dd$b[duplicated(dd$b)]),]
Solution 2:
May I suggest an alternative, faster way to do this with data.table
?
require(data.table) ## 1.9.2
setDT(df)[, .N, by=B][N > 1L]$B
(or) you can couple .I
(another special variable - see ?data.table
) which gives the corresponding row number in df
, along with .N
as follows:
setDT(df)[df[, .I[.N > 1L], by=B]$V1]
(or) have a look at @mnel's another for another variation (using yet another special variable .SD
).
Solution 3:
Using table()
isn't the best because then you have to rejoin it to the original rows of the data.frame. The ave
function makes it easier to calculate row-level values for different groups. For example
dd<-data.frame(
a=1:10,
b=c(1,1,2,3,4,4,4,5,6, 6)
)
dd[with(dd, ave(b,b,FUN=length))>1, ]
#subset(dd, ave(b,b,FUN=length)>1) #same thing
a b
1 1 1
2 2 1
5 5 4
6 6 4
7 7 4
9 9 6
10 10 6
Here, for each level of b
, it counts the length of b
, which is really just the number of b
's and returns that back to the appropriate row for each value. Then we use that to subset.