Data type during transferring data with ADF to Databricks tables converts into string
You can use parquet
format instead of CSV in the sink for the ADF pipeline. It will retain the datatype as in source, rather than string for all columns like in CSV. Also, parquet
is good for you in couple of ways:
- You can also use some form of compression like snappy, to save some space in ADLS
- Easy with spark/databricks/Hive integration, as you mentioned in your qn.
A small comparison for you to understand. I tried with parquet and csv, and you can see the difference here. ADF pipeline sink:
CSV sink (All columns as string)
Parquet: ( Columns with equivalent format)