Arrow RecordBatches to Spark Dataframe

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Arrow RecordBatches to Spark Dataframe

Tanveer Ahmad - EWI

Hi all,

I have a small question, if you people can help me.

In this code snippet, Jether is converting an prdd (RDD) of pd.Dataframes objects to Arrow RecordBatches (slices) and then to Spark Dataframe finally. Similarly the code in Scala converts   JavaRDD to Spark Dataframe.

If I already have an ardd (RDD) of pa.RecordBatch (Arrow RecordBatches) objects, how can I convert it to Spark Dataframe directly without using Pandas in PySpark? Thanks.

Tanveer Ahmad