How to store SparkR result into an R object?
Still new to the world of Azure Databricks, the use of SparkR remains very obscure to me, even for very simple tasks...
It took me a very long time to find how to count distinct values, and I'm not sure it's the right way to go :
library(SparkR)
sparkR.session()
DW <- sql("select * from db.mytable")
nb.var <- head(summarize(DW, n_distinct(DW$VAR)))
I thought I found, but nb.per is not an object, but still a dataframe...
class(nb.per)
[1] "data.frame"
I tried :
nb.per <- as.numeric(head(summarize(DW, n_distinct(DW$PERIODE))))
It seems ok, but I'm pretty sure there is a better way to achieve this ?
Thanks !
Solution 1:
The SparkR::sql
function returns a SparkDataFrame.
In order to use this in R as an R data.frame, you can simply coerce it:
as.data.frame(sql("select * from db.mytable"))