New posts in apache-spark

Is there a reason not to use SparkContext.getOrCreate when writing a spark job?

Spark DataFrames when udf functions do not accept large enough input variables

Amazon s3a returns 400 Bad Request with Spark

Saving dataframe to local file system results in empty results

Spark load data and add filename as dataframe column

Dealing with a large gzipped file in Spark

Encode an ADT / sealed trait hierarchy into Spark DataSet column

Joining many large files on AWS

Why does Spark fail with java.lang.OutOfMemoryError: GC overhead limit exceeded?

Operate on neighbor elements in RDD in Spark

How to control preferred locations of RDD partitions?

What is the difference between Apache Mahout and Apache Spark's MLlib?

Why does sortBy transformation trigger a Spark job?

'PipelinedRDD' object has no attribute 'toDF' in PySpark

Why does partition parameter of SparkContext.textFile not take effect?

Convert date from String to Date format in Dataframes

PySpark in iPython notebook raises Py4JJavaError when using count() and first()

How to group by common element in array?

Spark: Transpose DataFrame Without Aggregating

ALS model - how to generate full_u * v^t * v?