How to get name of dataframe column in pyspark?
Solution 1:
You can get the names from the schema by doing
spark_df.schema.names
Printing the schema can be useful to visualize it as well
spark_df.printSchema()
Solution 2:
The only way is to go an underlying level to the JVM.
df.col._jc.toString().encode('utf8')
This is also how it is converted to a str
in the pyspark code itself.
From pyspark/sql/column.py:
def __repr__(self):
return 'Column<%s>' % self._jc.toString().encode('utf8')