Java Spark job significantly slower than Python

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Java Spark job significantly slower than Python

Korb, Michael [USA]
Hi,

I'm experimenting with a Spark analytic on a 9-node cluster, and the Python version runs in about 5 minutes, whereas the Java version with all the same SparkContext configurations (and everything else being equal) takes 40+ minutes.

Does anyone know what may be causing this performance issue? What is pyspark doing differently?

Thanks,
Mike