How to join a spark dataframe twice with different id type

Solution 1:

You can do that with the following code

with_id = (users.join(events, on=users["Id"]==events["Id"], how='inner')
                .select(events["Id"], events["Name"],"Cel","Date","EventType"))

incorrect_id = (users.join(events, on=users["Id"]==events["Id"], how='leftanti')
                        .join(events, on=users["Cel"]==events["Id"])
                        .select(users["Id"], events["Name"],"Cel","Date","EventType"))


result = with_id.unionAll(incorrect_id)

The result

result.show()
+---+--------+----------+----------+---------+
| Id|    Name|       Cel|      Date|EventType|
+---+--------+----------+----------+---------+
|324|  Daniel|5511737379|2022-01-15| purchase|
|350|    Jack|3247623817|2022-01-16| purchase|
|380|Michelle|3247623322|2022-01-10|    claim|
+---+--------+----------+----------+---------+