PySpark - how to replace null array in JSON file

Solution 1:

Not sure which spark version you are using, I have checked with spark 3.1.2 and 2.4.5 where empty array fields are not getting ignored.

You can use below line to get desired result,

df.withColumn('a', when(size('a')== 0, array(lit('-'))).otherwise(col('a'))).show()

+---+------+--------+
|  a|     b|       c|
+---+------+--------+
|[-]|[1, 2]|a string|
+---+------+--------+