PySpark DataFrame - Join on multiple columns dynamically
Solution 1:
Why not use a simple comprehension:
firstdf.join(
seconddf,
[col(f) == col(s) for (f, s) in zip(columnsFirstDf, columnsSecondDf)],
"inner"
)
Since you use logical it is enough to provide a list of conditions without &
operator.
Solution 2:
@Mohan sorry i dont have reputation to do "add a comment". Having column same on both dataframe,create list with those columns and use in the join
col_list=["id","column1","column2"]
firstdf.join( seconddf, col_list, "inner")