[Spark Core] Support for parquet column indexes

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[Spark Core] Support for parquet column indexes

Kamil Krzysztof Krynicki

Recently there has been an addition to the parquet files. Namely, the column indexes. 

See: https://stackoverflow.com/questions/26909543/index-in-parquet/40714337#40714337

Available since parquet encoder 1.11, parquet format 2.5.

It seems to improve the IO performance by an order of magnitude in certain scenarios, which is simply fantastic.

My question are: 
- are there any plans to include it in upcoming spark releases? Could you direct me to an issue, if such exists?
- is not, could you suggest a way to at least write parquet files in the new format and worry about the optimized reads later? Would simply forcing the parquet dependencies to the said versions be enough?

Thank you!

Kamil Krynicki