Why when writing Parquet files, columns are converted to nullable?
Spark documentation says:
"When writing Parquet files, all columns are automatically converted to be nullable for compatibility reasons."
Could you elaborate on the reasons for this choice?
Is this for a similar reason as Protobuf which gets rid of "required" fields in version 3, since Protobuf and Parquet inherit from Dremel paper?
Which risks imply such a decision?
Nullability seems like a validation constraint and I am still not convinced if this is the responsibility of Parquet schema to enforce this constraint or not. Having too many constraints would make parsing, compression less efficient. I imagine if we had dozens of numerical types?
I cannot find an answer on this mailing-list or on SO, nor in Google too. If this question has already been answered feel free to redirect me to it.