[PySpark] Sharing testing library and requesting feedback

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[PySpark] Sharing testing library and requesting feedback

Matt Hagy
We recently open sourced mockrdd, a library for testing PySpark code. github.com/LiveRamp/mockrdd

The mockrdd.MockRDD class offers similar behavior to pyspark.RDD with the following extra benefits.
* Extensive sanity checks to identify invalid inputs
* More meaningful error messages for debugging issues
* Straightforward to running within pdb
* Removes Spark dependencies from development and testing environments
* No Spark overhead when running through a large test suite

Would anyone find this useful? What other features would make this more useful? Are there benefits to using PySpark in local mode for testing that we're not considering?