New posts in apache-spark

"Container killed by YARN for exceeding memory limits. 10.4 GB of 10.4 GB physical memory used" on an EMR cluster with 75GB of memory

PySpark: java.lang.OutofMemoryError: Java heap space

How to join a spark dataframe twice with different id type

How to enable Debug log in spark driver without enabling in the executors, in yarn cluster mode? [duplicate]

Spark: subtract two DataFrames

How to flatten a struct in a Spark dataframe?

Is Spark's KMeans unable to handle bigdata?

How to transform data with sliding window over time series data in Pyspark

How to convert a DataFrame back to normal RDD in pyspark?

Parsing multiline records in Scala

How to list all cassandra tables

Why using a UDF in a SQL query leads to cartesian product?

Using UDF ignores condition in when

Apache Spark how to append new column from list/array to Spark dataframe

Multiple Spark applications with HiveContext

call of distinct and map together throws NPE in spark library

A list as a key for PySpark's reduceByKey

What are possible reasons for receiving TimeoutException: Futures timed out after [n seconds] when working with Spark [duplicate]

PySpark: how to resample frequencies

How to convert column with string type to int form in pyspark data frame?