How to find the size or shape of a DataFrame in PySpark?
I am trying to find out the size/shape of a DataFrame in PySpark. I do not see a single function that can do this.
In Python, I can do this:
data.shape()
Is there a similar function in PySpark? This is my current solution, but I am looking for an element one
row_number = data.count()
column_number = len(data.dtypes)
The computation of the number of columns is not ideal...
You can get its shape
with:
print((df.count(), len(df.columns)))
Use df.count()
to get the number of rows.