Find complement of a data frame (anti - join)
I have two data frames(df and df1). df1 is subset of df. I want to get a data frame which is complement of df1 in df, i.e. return rows of the first data set which are not matched in the second. For example let,
data frame df:
heads
row1
row2
row3
row4
row5
data frame df1:
heads
row3
row5
Then the desired output df2 is:
heads
row1
row2
row4
Solution 1:
You could also do some type of anti join with data.table
s binary join
library(data.table)
setkey(setDT(df), heads)[!df1]
# heads
# 1: row1
# 2: row2
# 3: row4
EDIT: Starting data.table v1.9.6+ we can join data.tables without setting keys while using on
setDT(df)[!df1, on = "heads"]
EDIT2: Starting data.table v1.9.8+ fsetdiff
was introduced which is basically a variation of the solution above, just over all the column names of the x
data.table, e.g. x[!y, on = names(x)]
. If all
set to FALSE
(the default behavior), then only unique rows in x
will be returned. For the case of only one column in each data.table the following will be equivalent to the previous solutions
fsetdiff(df, df1, all = TRUE)
Solution 2:
Try anti_join
from dplyr
library(dplyr)
anti_join(df, df1, by='heads')
Solution 3:
Try the %in%
command and reverse it with !
df[!df$heads %in% df1$heads,]