New posts in apache-spark-sql

Automatically and Elegantly flatten DataFrame in Spark SQL

scala apache-spark apache-spark-sql

How do I call a UDF on a Spark DataFrame using JAVA?

java apache-spark apache-spark-sql user-defined-functions

Apache Spark -- Assign the result of UDF to multiple dataframe columns

python apache-spark pyspark apache-spark-sql user-defined-functions

PySpark: withColumn() with two conditions and three outcomes

apache-spark hive pyspark apache-spark-sql hiveql

aggregate function Count usage with groupBy in Spark

java scala apache-spark pyspark apache-spark-sql

Apache Spark: Get number of records per partition

scala apache-spark hadoop apache-spark-sql partitioning

PySpark - rename more than one column using withColumnRenamed

apache-spark pyspark apache-spark-sql rename

Perform a typed join in Scala with Spark Datasets

scala apache-spark join apache-spark-sql apache-spark-dataset

Create new Dataframe with empty/null field values

scala apache-spark dataframe apache-spark-sql

Pyspark: Filter dataframe based on multiple conditions

sql filter pyspark apache-spark-sql pyspark-sql

How to compare two dataframe and print columns that are different in scala

scala apache-spark apache-spark-sql compare

Median / quantiles within PySpark groupBy

apache-spark pyspark apache-spark-sql pyspark-sql

PySpark: multiple conditions in when clause

python apache-spark dataframe pyspark apache-spark-sql

Generate a Spark StructType / Schema from a case class

apache-spark apache-spark-sql

How to flatten a struct in a Spark dataframe?

java apache-spark pyspark apache-spark-sql

Why using a UDF in a SQL query leads to cartesian product?

sql apache-spark apache-spark-sql

Apache Spark how to append new column from list/array to Spark dataframe

scala apache-spark dataframe apache-spark-sql

What are possible reasons for receiving TimeoutException: Futures timed out after [n seconds] when working with Spark [duplicate]

scala apache-spark apache-spark-sql spark-dataframe

PySpark: how to resample frequencies

apache-spark pyspark apache-spark-sql time-series

Take n rows from a spark dataframe and pass to toPandas()

python apache-spark-sql spark-dataframe