Get the size/length of an array column
I'm new in Scala programming and this is my question: How to count the number of string for each row? My Dataframe is composed of a single column of Array[String] type.
friendsDF: org.apache.spark.sql.DataFrame = [friends: array<string>]
Solution 1:
You can use the size
function:
val df = Seq((Array("a","b","c"), 2), (Array("a"), 4)).toDF("friends", "id")
// df: org.apache.spark.sql.DataFrame = [friends: array<string>, id: int]
df.select(size($"friends").as("no_of_friends")).show
+-------------+
|no_of_friends|
+-------------+
| 3|
| 1|
+-------------+
To add as a new column:
df.withColumn("no_of_friends", size($"friends")).show
+---------+---+-------------+
| friends| id|no_of_friends|
+---------+---+-------------+
|[a, b, c]| 2| 3|
| [a]| 4| 1|
+---------+---+-------------+
Solution 2:
You can use the size
function and that would give you the number of elements in the array. There is only issue as pointed by @aloplop85 that for an empty array, it gives you value of 1 and that is correct because empty string is also considered as a value in an array but if you want to get around this for your use case where you want the size to be zero if the array has one value and that is also empty string.
//source data
val df = Seq((Array("a","b","c"), 2), (Array("a"), 4),(Array(""),6)).toDF("friends", "id")
//check the size of the array and see if it 1 and first element is empty string then set value to 0
val df1 = df.withColumn("no_of_friends",when(size(col("friends")) === 1 && col("friends")(0) === "" , lit(0)).otherwise(size(col("friends")) ))
You can verify the output as below: