Convert a spark DataFrame to pandas DF
following should work
some_df = sc.parallelize([
("A", "no"),
("B", "yes"),
("B", "yes"),
("B", "no")]
).toDF(["user_id", "phone_number"])
pandas_df = some_df.toPandas()
In my case the following conversion from spark dataframe to pandas dataframe worked:
pandas_df = spark_df.select("*").toPandas()
Converting spark data frame to pandas can take time if you have large data frame. So you can use something like below:
spark.conf.set("spark.sql.execution.arrow.enabled", "true")
pd_df = df_spark.toPandas()
I have tried this in DataBricks.