New posts in apache-spark-sql

Split Spark Dataframe string column into multiple columns

apache-spark pyspark apache-spark-sql

How to define partitioning of DataFrame?

scala apache-spark dataframe apache-spark-sql partitioning

How to use list comprehension on a column with array in pyspark?

python dataframe apache-spark pyspark apache-spark-sql

How to define and use a User-Defined Aggregate Function in Spark SQL?

scala apache-spark apache-spark-sql aggregate-functions user-defined-functions

Spark SQL replacement for MySQL's GROUP_CONCAT aggregate function

apache-spark aggregate-functions apache-spark-sql

How to use JDBC source to write and read data in (Py)Spark?

python scala apache-spark apache-spark-sql pyspark

How to run independent transformations in parallel using PySpark?

python-2.7 apache-spark pyspark apache-spark-sql python-multiprocessing

How to get unique key from Dataset Spark [duplicate]

dataframe scala apache-spark apache-spark-sql

DataFrame / Dataset groupBy behaviour/optimization

performance apache-spark dataframe apache-spark-sql apache-spark-dataset

Spark - load CSV file as DataFrame?

scala apache-spark hadoop apache-spark-sql hdfs

Group by and save the max value with overlapping columns in scala spark

scala apache-spark apache-spark-sql

Spark 2.0 Dataset vs DataFrame

scala apache-spark apache-spark-sql apache-spark-dataset apache-spark-2.0

Schema comparison of two dataframes in scala

scala apache-spark-sql schema

How to change dataframe column names in pyspark?

python apache-spark pyspark apache-spark-sql

PySpark DataFrame - Join on multiple columns dynamically

python apache-spark dataframe pyspark apache-spark-sql

How to connect Spark SQL to remote Hive metastore (via thrift protocol) with no hive-site.xml?

apache-spark hive apache-spark-sql

How to convert rdd object to dataframe in spark

scala apache-spark apache-spark-sql rdd

How to zip two array columns in Spark SQL

python pandas apache-spark pyspark apache-spark-sql

How to split Vector into columns - using PySpark

python apache-spark pyspark apache-spark-sql apache-spark-ml

Concatenate columns in Apache Spark DataFrame

sql apache-spark dataframe apache-spark-sql