Spark 3 - Predicate/Projection Pushdown Feature

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Spark 3 - Predicate/Projection Pushdown Feature

Pınar Ersoy

I am working as a Data Scientist and using Spark with Python while developing my models.

Our data model has multiple nested structures (Ex. For the first two-level, I can read them as the following with PySpark:

data.where(col('attributes.product') != "")

However, I cannot read the three-level nested structure:

data.where(col('attributes.product.feature') != "")

I was hoping to overcome this problem with Spark 3 with the advancements of Predicate & Projection Pushdown; however, after I tested them I still cannot read the third-level without flattening the data.  

I would like to ask whether there will be an improvement in reading nested data (JSON/Parquet) that has more than two levels in the upcoming versions of Spark 3. 

Or if I am missing something in the existing Spark versions released, please let me know how to proceed. 

Kindest Regards,
Pınar Ersoy