Performance of PySpark 2.3.2 on Microsoft Windows

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Performance of PySpark 2.3.2 on Microsoft Windows

Wim Van Leuven

we are writing a lot of data processing pipelines for Spark using pyspark and add a lot of integration tests. 

In our enterprise environment, a lot of people are running Windows PCs and we notice that build times are really slow on Windows because of the integration tests. These metrics are compared against the run of the builds on Mac (dev PCs) or Linux (our CI servers are Linux). 

We can not identify easily what is causing the slow down, but it's mostly pyspark communicating with spark on the JVM. 

Any pointers/clues to where to look for more information? 
Obviously, plain help in the matter is more then welcome as well. 

Kind regards,