New posts in apache-spark-sql

Why is Apache-Spark - Python so slow locally as compared to pandas?

Removing duplicate columns after a DF join in Spark

More than one hour to execute pyspark.sql.DataFrame.take(4)

Filling gaps in timeseries Spark

Spark scala column level mismatches from 2 dataframes

How do I convert an array (i.e. list) column to Vector

Reshaping/Pivoting data in Spark RDD and/or Spark DataFrames

How to loop through each row of dataFrame in pyspark

How to join on multiple columns in Pyspark?

How does createOrReplaceTempView work in Spark?

Create Spark DataFrame. Can not infer schema for type: <type 'float'>

How to use a Scala class inside Pyspark

What is the difference between Apache Spark SQLContext vs HiveContext?

Joining Spark dataframes on the key

Casting string type column percentage to a decimal

Cannot find col function in pyspark

pyspark dataframe filter or include based on list

How to exclude multiple columns in Spark dataframe in Python

How to save/insert each DStream into a permanent table

Dividing complex rows of dataframe to simple rows in Pyspark