Pyspark filter dataframe by columns of another dataframe
Solution 1:
Left anti join is what you're looking for:
df1.join(df2, ["userid", "group"], "leftanti")
but the same thing can be done with left outer join:
(df1
.join(df2, ["userid", "group"], "leftouter")
.where(df2["pick"].isNull())
.drop(df2["pick"]))