Spark Mongodb connector hangs indefinitely, not working on Amazon EMR

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Spark Mongodb connector hangs indefinitely, not working on Amazon EMR

Daniel Stojanov
When running a Pyspark application on my local machine I am able to save and retrieve from the Mongodb server using the Mongodb Spark connector. All works properly. When submitting the exact same application on my Amazon EMR cluster I can see that the package for the Spark driver is being properly collected from Maven when the job is submitted. However, it is not working.

From my instance of Amazon EMR I can communicate with the database using Pymongo without problems. I can load/save dataframes when using pyspark interactively from the driver, but when submitting jobs via spark-submit over the yarn cluster it hangs.

The problem gives no error messages, it just shows 0 activity on the driver and executor. The pyspark application just stops until manually terminated.

Has anyone else used the Mongodb Spark connector from Amazon EMR?


--