Is it possible to use "if condition" python using Pyspark columns? [duplicate]
Yes, there are the built-in spark.sql
's functions called when
and otherwise
that do that.
With the following dataframe.
df.show()
+---+----+----+
| id|team|game|
+---+----+----+
| 1| A|Home|
| 2| A|Away|
| 3| B|Home|
| 4| B|Away|
| 5| C|Home|
| 6| C|Away|
| 7| D|Home|
| 8| D|Away|
+---+----+----+
You can use when
and otherwise
conditions in the following way.
from pyspark.sql import functions
df = (df.withColumn("result",
functions.when((df["team"] == "A") & (df["game"] == "Home"), "WIN")
.when((df["team"] == "B") & (df["game"] == "Away"), "WIN")
.when((df["team"] == "D") & (df["game"] == "Home"), "WIN")
.when((df["team"] == "D") & (df["game"] == "Away"), "WIN")
.otherwise("LOSS")))
df.show()
+---+----+----+------+
| id|team|game|result|
+---+----+----+------+
| 1| A|Home| WIN|
| 2| A|Away| LOSS|
| 3| B|Home| LOSS|
| 4| B|Away| WIN|
| 5| C|Home| LOSS|
| 6| C|Away| LOSS|
| 7| D|Home| WIN|
| 8| D|Away| WIN|
+---+----+----+------+