New posts in apache-spark-sql

Convert a spark DataFrame to pandas DF

pandas apache-spark apache-spark-sql

Filter spark DataFrame on string contains

scala apache-spark dataframe apache-spark-sql

What should be the optimal value for spark.sql.shuffle.partitions or how do we increase partitions when using Spark SQL?

apache-spark apache-spark-sql

Derive multiple columns from a single column in a Spark DataFrame

scala apache-spark dataframe apache-spark-sql user-defined-functions

Spark dataframe: collect () vs select ()

dataframe apache-spark apache-spark-sql

Custom month range with current date in window function

apache-spark pyspark apache-spark-sql

DataFrame partitionBy to a single Parquet file (per partition)

apache-spark apache-spark-sql

Spark DataFrames when udf functions do not accept large enough input variables

scala apache-spark dataframe apache-spark-sql apache-spark-mllib

Spark load data and add filename as dataframe column

apache-spark pyspark apache-spark-sql

How to count unique ID after groupBy in pyspark

python pyspark apache-spark-sql

'PipelinedRDD' object has no attribute 'toDF' in PySpark

python apache-spark pyspark apache-spark-sql rdd

Convert date from String to Date format in Dataframes

apache-spark apache-spark-sql

How to group by common element in array?

apache-spark apache-spark-sql

Pyspark : forward fill with last observation for a DataFrame

apache-spark pyspark apache-spark-sql spark-dataframe

Why does format("kafka") fail with "Failed to find data source: kafka." (even with uber-jar)?

apache-spark apache-spark-sql spark-structured-streaming uberjar

Apache Spark Python Cosine Similarity over DataFrames

python apache-spark pyspark apache-spark-sql cosine-similarity

Why does Spark think this is a cross / Cartesian join

apache-spark dataframe pyspark apache-spark-sql

Apache Spark, add an "CASE WHEN ... ELSE ..." calculated column to an existing DataFrame

scala apache-spark dataframe apache-spark-sql

Convert null values to empty array in Spark DataFrame

apache-spark dataframe apache-spark-sql apache-spark-1.5

Upacking a list to select multiple columns from a spark data frame

apache-spark apache-spark-sql spark-dataframe