New posts in apache-spark

Spark: how to get the number of written rows?

Pyspark : forward fill with last observation for a DataFrame

apache-spark pyspark apache-spark-sql spark-dataframe

Importing spark.implicits._ in scala

scala apache-spark

Difference in Used, Committed and Max Heap Memory

java apache-spark memory-management jvm spark-streaming

Why does format("kafka") fail with "Failed to find data source: kafka." (even with uber-jar)?

apache-spark apache-spark-sql spark-structured-streaming uberjar

Why does a job fail with "No space left on device", but df says otherwise?

Difference between == and === in Scala, Spark

scala apache-spark

Apache Spark: What is the equivalent implementation of RDD.groupByKey() using RDD.aggregateByKey()?

apache-spark rdd pyspark

TaskSchedulerImpl: Initial job has not accepted any resources;

java apache-spark cassandra datastax

Apache Spark Python Cosine Similarity over DataFrames

python apache-spark pyspark apache-spark-sql cosine-similarity

Replace null values in Spark DataFrame

scala apache-spark dataframe

PySpark groupByKey returning pyspark.resultiterable.ResultIterable

python apache-spark pyspark

How do I install pyspark for use in standalone scripts?

python apache-spark

Couldn't run pyspark on windows cmd and conda cmd

python apache-spark pyspark conda

Why does Spark think this is a cross / Cartesian join

apache-spark dataframe pyspark apache-spark-sql

How to run multiple jobs in one Sparkcontext from separate threads in PySpark?

python multithreading apache-spark pyspark

Apache Spark, add an "CASE WHEN ... ELSE ..." calculated column to an existing DataFrame

scala apache-spark dataframe apache-spark-sql

how to creat spark dataframe from a Map(string,any) scala?

scala apache-spark

Convert null values to empty array in Spark DataFrame

apache-spark dataframe apache-spark-sql apache-spark-1.5

Apache spark dealing with case statements

apache-spark pyspark spark-dataframe rdd pyspark-sql