How to flatten Array of WrappedArray?

I have tied various Stackoverflow question but unable to get to the end goal of a array of distinct values from the DataFrame.

val df = Seq(
  List("Mandrin","Hindi","English"),
  List("French","English")
).toDF("languages")

df.collect.map(_.toSeq).flatten

Returns

Array[Any] = Array(WrappedArray(Mandrin, Hindi, English), WrappedArray(French, English))

Desired result is

Array(Mandrin, Hindi, English, French)

If I could get it to a flat array with duplicates, then I can call distinct.

thanks.


You don't need that additional map step, when you collect you already have a list of sequences of string. You just need to flatten them to get an Array of Strings.

val languagesArray:Array[String] = df.collect().flatten

However when working with huge sets of data it is not often the best idea to collect the data, maybe you can consider using explode

  import org.apache.spark.sql.functions._
  df.select(explode($"languages")).show()

this generates the following output

+-------+
|    col|
+-------+
|Mandrin|
|  Hindi|
|English|
| French|
|English|
+-------+

on either of the output you can then do a distinct to get the distinct languages.