Unit testing PySpark Code and doing assertion

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Unit testing PySpark Code and doing assertion

Rahul Nandi
Hi,
I'm trying to do unit testing of my pyspark DataFrame code. My goal is to do an assertion on the schema and data of the DataFrames. I'm looking for options if there are any known libraries that I can use for doing the same. Any library which can work on 10-15 records in the DataFrame is good for me. 
As of now I'm using unittest library and using assertCountEquals method to do the assertion. This is quite okay, but it does not do the schema level validation. The failure message is not easily understandable.

If any of you are using any special techniques, let me know. Thanks in advance.

Regards,
Rahul