Issue with pyspark query

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Issue with pyspark query

Tzahi File
Hi,

This is a general question regarding moving spark SQL query to PySpark, if needed I will add some more from the errors log and query syntax. 
I'm trying to move a spark SQL query to run through PySpark. 
The query syntax and spark configuration are the same. 
For some reason the query failed to run through PySpark with an java heap space error. 
In the Spark SQL query I'm using insert overwrite partition, while in pyspark I'm using DF to write the data to a specific location in S3. 

Are there any differences in the configuration that you might think I need to change?


Thanks,

--
Tzahi
Data Engineer