Which predicate pushdown work or does not work with Parquet?

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Which predicate pushdown work or does not work with Parquet?

Manuel Vonthron
Hi all,

I am trying to determine which predicate pushdown work or does not work with Spark+Parquet (mostly for versions 2.1.0 and/or 2.2.0).

I've read a lot of messages from the pull requests comments, JIRA tickets, even the comments in Parquet's source but it's hard to have a clear picture of when a pushdown is honoured depending on 
  - the data type (Int? String? Timestamp?)
  - operator involved (isNull, >=, ...)
  - and even the column name (is there a "." in it or not?) 

The only types I consistently got working in my tests and reads are "regular numbers" but support for Strings and Timestamps is crucial for my use case.

Do you have any "reference" on this subject?

Additionally, here is a test I've been running with it's results:

There might be errors or misconfigured things but the TL;DR is: I only got INTs and BOOLs to reliably work with no weirdness :| 


Manuel Vonthron big data software developer office +1.514.313.1400 cell    +1.514.677.8699

CONFIDENTIALITY: This e-mail message (including attachments, if any) is confidential and is intended only for the addressee. Any unauthorized use or disclosure is strictly prohibited. Disclosure of this e-mail to anyone other than the intended addressee does not constitute waiver of privilege. If you have received this communication in error, please notify us immediately and delete this. Thank you for your cooperation.  This message has not been encrypted.  Special arrangements can be made for encryption upon request.

CONFIDENTIALITÉ:  Ce message courriel (y compris les pièces jointes, le cas échéant) est confidentiel et destiné uniquement à la personne ou  à l'entité à qui il est adressé. Toute utilisation ou divulgation non permise est strictement interdite.  L'obligation de confidentialité et de secret professionnel demeure malgré toute divulgation.  Si vous avez reçu le présent courriel et ses annexes par erreur, veuillez nous en informer immédiatement et le détruire.  Nous vous remercions de votre collaboration.  Le présent message n'a pas été crypté.  Le cryptage est possible sur demande spéciale.