Disable parquet metadata for count

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Disable parquet metadata for count

Gary Li
Hi all,

I am implementing a custom data sourceV1 and would like to enforce a pushdown filter for every query. But when I run a simple count query df.count(), Spark will ignore the filter and use the metadata in the parquet footer to accumulate the count of each block directly, which will return an unexpected result for my use cases. Is there any way I can skip using the parquet metadata and enforce the filter for every query?

Thank you,