How to join a spark dataframe twice with different id type
Solution 1:
You can do that with the following code
with_id = (users.join(events, on=users["Id"]==events["Id"], how='inner')
.select(events["Id"], events["Name"],"Cel","Date","EventType"))
incorrect_id = (users.join(events, on=users["Id"]==events["Id"], how='leftanti')
.join(events, on=users["Cel"]==events["Id"])
.select(users["Id"], events["Name"],"Cel","Date","EventType"))
result = with_id.unionAll(incorrect_id)
The result
result.show()
+---+--------+----------+----------+---------+
| Id| Name| Cel| Date|EventType|
+---+--------+----------+----------+---------+
|324| Daniel|5511737379|2022-01-15| purchase|
|350| Jack|3247623817|2022-01-16| purchase|
|380|Michelle|3247623322|2022-01-10| claim|
+---+--------+----------+----------+---------+