How to store SparkR result into an R object?

Still new to the world of Azure Databricks, the use of SparkR remains very obscure to me, even for very simple tasks...

It took me a very long time to find how to count distinct values, and I'm not sure it's the right way to go :

library(SparkR)
sparkR.session()

DW <- sql("select * from db.mytable")
nb.var <- head(summarize(DW, n_distinct(DW$VAR)))

I thought I found, but nb.per is not an object, but still a dataframe...

class(nb.per)
[1] "data.frame"

I tried :

nb.per <- as.numeric(head(summarize(DW, n_distinct(DW$PERIODE))))

It seems ok, but I'm pretty sure there is a better way to achieve this ?

Thanks !


Solution 1:

The SparkR::sql function returns a SparkDataFrame.

In order to use this in R as an R data.frame, you can simply coerce it:

 as.data.frame(sql("select * from db.mytable"))