Total zero count across all columns in a pyspark dataframe

I need to find the percentage of zero across all columns in a pyspark dataframe. How to find the count of zero across each columns in the dataframe?

P.S: I have tried converting the dataframe into a pandas dataframe and used value_counts. But inferring it's observation is not possible for a large dataset.


Solution 1:

"How to find the count of zero across each columns in the dataframe?"

First:

import pyspark.sql.functions as F
df_zero = df.select([F.count(F.when(df[c] == 0, c)).alias(c) for c in df.columns])

Second: you can then see the count (compared to .show(), this gives you better view. And the speed is not much different):

df_zero.limit(2).toPandas().head()

Enjoy! :)