Why when writing Parquet files, columns are converted to nullable?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Why when writing Parquet files, columns are converted to nullable?

Julien Benoit
Hi,

Spark documentation says:
"When writing Parquet files, all columns are automatically converted to be nullable for compatibility reasons."

Could you elaborate on the reasons for this choice?

Is this for a similar reason as Protobuf which gets rid of "required" fields in version 3, since Protobuf and Parquet inherit from Dremel paper?

Which risks imply such a decision?

Nullability seems like a validation constraint and I am still not convinced if this is the responsibility of Parquet schema to enforce this constraint or not. Having too many constraints would make parsing, compression less efficient. I imagine if we had dozens of numerical types?

I cannot find an answer on this mailing-list or on SO, nor in Google too. If this question has already been answered feel free to redirect me to it.


Thank you.

Julien.