Spark DataFrame Schema Nullable Fields
Solution 1:
In general Spark Datasets
either inherit nullable
property from its parents, or infer based on the external data types.
You can argue if it is a good approach or not but ultimately it is sensible. If semantics of a data source doesn't support nullability constraints, then application of a schema cannot either. At the end of the day it is always better to assume that things can be null
, than fail on the runtime if this the opposite assumption turns out to be incorrect.