New posts in apache-spark

Spark 2.1 Hangs while reading a huge datasets

apache-spark hive apache-spark-sql

Efficient Count Distinct with Apache Spark

distinct apache-spark

Joining two dataframes without a common column

scala apache-spark

Databricks spark.readstream format differences

apache-spark databricks spark-structured-streaming

What is RDD in spark

scala hadoop apache-spark rdd

Array Intersection in Spark SQL

apache-spark apache-spark-sql spark-dataframe hiveql apache-spark-dataset

spark dataframe drop duplicates and keep first

dataframe apache-spark pyspark apache-spark-sql duplicates

How to use SQL query to define table in dbtable?

jdbc apache-spark apache-spark-sql

Why can't PySpark find py4j.java_gateway?

python python-2.7 apache-spark ipython py4j

FetchFailedException or MetadataFetchFailedException when processing big data set

apache-spark hadoop-yarn

Scala-Spark Dynamically call groupby and agg with parameter values

scala apache-spark group-by customization aggregate

How to remove parentheses around records when saveAsTextFile on RDD[(String, Int)]?

scala apache-spark

Spark-Monotonically increasing id not working as expected in dataframe?

scala apache-spark apache-spark-sql

How to debug Spark application locally?

Spark mllib predicting weird number or NaN

python apache-spark pyspark gradient-descent apache-spark-mllib

Build a hierarchy from a relational data-set using Pyspark

python apache-spark pyspark hierarchy graphframes

Fetching distinct values on a column using Spark DataFrame

scala apache-spark dataframe apache-spark-sql spark-dataframe

Spark - extracting single value from DataFrame

scala apache-spark apache-spark-sql

Spark ML VectorAssembler returns strange output

scala apache-spark apache-spark-mllib apache-spark-ml

How do I unit test PySpark programs?

python unit-testing apache-spark pyspark