Newbetuts
.
New posts in apache-spark
"Container killed by YARN for exceeding memory limits. 10.4 GB of 10.4 GB physical memory used" on an EMR cluster with 75GB of memory
apache-spark
emr
amazon-emr
bigdata
PySpark: java.lang.OutofMemoryError: Java heap space
java
apache-spark
out-of-memory
heap-memory
pyspark
How to join a spark dataframe twice with different id type
join
pyspark
apache-spark
How to enable Debug log in spark driver without enabling in the executors, in yarn cluster mode? [duplicate]
logging
log4j
apache-spark
hadoop-yarn
Spark: subtract two DataFrames
apache-spark
dataframe
rdd
How to flatten a struct in a Spark dataframe?
java
apache-spark
pyspark
apache-spark-sql
Is Spark's KMeans unable to handle bigdata?
python
apache-spark
k-means
apache-spark-mllib
bigdata
How to transform data with sliding window over time series data in Pyspark
python
apache-spark
time-series
pyspark
How to convert a DataFrame back to normal RDD in pyspark?
python
apache-spark
pyspark
Parsing multiline records in Scala
scala
apache-spark
rdd
How to list all cassandra tables
scala
apache-spark
cassandra
spark-cassandra-connector
Why using a UDF in a SQL query leads to cartesian product?
sql
apache-spark
apache-spark-sql
Using UDF ignores condition in when
python
apache-spark
pyspark
spark-dataframe
user-defined-functions
Apache Spark how to append new column from list/array to Spark dataframe
scala
apache-spark
dataframe
apache-spark-sql
Multiple Spark applications with HiveContext
apache-spark
hive
pyspark
call of distinct and map together throws NPE in spark library
scala
nullpointerexception
apache-spark
A list as a key for PySpark's reduceByKey
python
apache-spark
rdd
pyspark
What are possible reasons for receiving TimeoutException: Futures timed out after [n seconds] when working with Spark [duplicate]
scala
apache-spark
apache-spark-sql
spark-dataframe
PySpark: how to resample frequencies
apache-spark
pyspark
apache-spark-sql
time-series
How to convert column with string type to int form in pyspark data frame?
python
dataframe
apache-spark
pyspark
apache-spark-sql
Prev
Next