New posts in apache-spark

get TopN of all groups after group by using Spark DataFrame

sql scala apache-spark apache-spark-sql

Apache Spark vs Akka [closed]

apache-spark parallel-processing akka distributed-computing

How to convert unix timestamp to date in Spark

scala datetime apache-spark timestamp nscala-time

How to pass whole Row to UDF - Spark DataFrame filter

how to get stats from database tables pyspark?

python apache-spark pyspark apache-spark-sql

What is going wrong with `unionAll` of Spark `DataFrame`?

scala apache-spark dataframe apache-spark-sql

How can I force Spark to execute code?

java scala hadoop apache-spark

Spark groupByKey alternative

python apache-spark pyspark rdd reduce

Modify collection inside a Spark RDD foreach

scala apache-spark rdd

How do I get a SQL row_number equivalent for a Spark RDD?

sql apache-spark row-number rdd

Pyspark: get list of files/directories on HDFS path

hadoop apache-spark pyspark

Spark 2.0.x dump a csv file from a dataframe containing one array of type string

arrays csv apache-spark

Spark Driver in Apache spark

Spark: produce RDD[(X, X)] of all possible combinations from RDD[X]

scala apache-spark

Spark : how to run spark file from spark shell

scala apache-spark cloudera-cdh cloudera-manager

Spark add new column to dataframe with value from previous row

python apache-spark dataframe pyspark apache-spark-sql

Read SAS file to get meta information

python apache-spark sas pyspark databricks

How to split a list to multiple columns in Pyspark?

apache-spark pyspark apache-spark-sql

How do I add an persistent column of row ids to Spark DataFrame?

apache-spark dataframe apache-spark-sql

Is gzip format supported in Spark?

java scala mapreduce gzip apache-spark