New posts in apache-spark

SparkR vs sparklyr [closed]

r apache-spark sparkr sparklyr

Filter spark DataFrame on string contains

scala apache-spark dataframe apache-spark-sql

What should be the optimal value for spark.sql.shuffle.partitions or how do we increase partitions when using Spark SQL?

apache-spark apache-spark-sql

pyspark: rolling average using timeseries data

apache-spark pyspark window-functions moving-average

Derive multiple columns from a single column in a Spark DataFrame

scala apache-spark dataframe apache-spark-sql user-defined-functions

Spark dataframe: collect () vs select ()

dataframe apache-spark apache-spark-sql

Custom month range with current date in window function

apache-spark pyspark apache-spark-sql

What conditions should cluster deploy mode be used instead of client?

When are accumulators truly reliable?

View RDD contents in Python Spark?

python apache-spark

why my simple spark code can not print anything?

scala apache-spark

Type arguments do not conform to trait type parameter bounds

scala apache-spark amazon-deequ

Pyspark Dataframe Convert country names to ISO codes with country-converter

python apache-spark

scala -get file size of individual json in directory

json scala apache-spark

pyspark: Efficiently have partitionBy write to same number of total partitions as original table

apache-spark pyspark

How spark read a large file (petabyte) when file can not be fit in spark's main memory

apache-spark rdd partition

Calculating the averages for each KEY in a Pairwise (K,V) RDD in Spark with Python

python apache-spark aggregate average rdd

How to use spark.DataFrameReader from Foundry Transforms

apache-spark pyspark palantir-foundry foundry-code-repositories

DataFrame partitionBy to a single Parquet file (per partition)

apache-spark apache-spark-sql

Explain the aggregate functionality in Spark (with Python and Scala)

python scala apache-spark aggregate rdd