Read SAS file to get meta information
Very new to data science technologies. Currently working on reading a SAS File (.sas7dbat).
Able to read the file using :
SAS7BDAT('/dbfs/mnt/myMntScrum1/sasFile.sas7bdat') as f:
for row in f:
print(row)
Row prints all the data.
When we view SAS files in SAS viewer we can see metadata E.g. Label Information & variable (column names) used on actual data
How can I read this metadata in Spark (Databricks) using Python ?
Solution 1:
Did you try pyreadstat?
It can directly read metadata.
import pyreadstat
df, meta = pyreadstat.read_sas7bdat('/path/to/a/file.sas7bdat')
Solution 2:
Most data analysis in Python is done using the pandas library which has a method called 'read_sas' which preserves the meta-data unless you are being ordered to use spark I strongly recommend pandas. Here is a set of instructions for SAS users: https://blog.dominodatalab.com/pandas-for-sas-users-part-1/