Available since parquet encoder 1.11, parquet format 2.5.
It seems to improve the IO performance by an order of magnitude in certain scenarios, which is simply fantastic.
My question are:
- are there any plans to include it in upcoming spark releases? Could you direct me to an issue, if such exists?
- is not, could you suggest a way to at least write parquet files in the new format and worry about the optimized reads later? Would simply forcing the parquet dependencies to the said versions be enough?