Need Unit test complete reference for Pyspark

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Need Unit test complete reference for Pyspark

Sachit Murarka
Hi Users,

I have to write Unit Test cases for PySpark. 
I think pytest-spark and "spark testing base" are good test libraries.

Can anyone please provide full reference for writing the test cases in Python using these?

Kind Regards,
Sachit Murarka
Reply | Threaded
Open this post in threaded view
|

Re: Need Unit test complete reference for Pyspark

Marco Mistroni
Hey
 they are good libraries..to get you started. Have used both of them.. unfortunately -as far as i saw when i started to use them  - only few people maintains them.
But you can get pointers out of them for writing tests. the code below can get you started
What you'll need is

- a method to create dataframe on the fly, perhaps from  a string.  you can have a look at pandas, it will have methods for it
- a method to test dataframe equality. you can use  df1.subtract(df2)

I am assuming you are into dataframes - rather than RDDs, for which the two packages you mention  should have everything you need

hht
 marco


import logging
from pyspark.sql import SparkSession
from pyspark import HiveContext
from pyspark import SparkConf
from pyspark import SparkContext
import pyspark
from pyspark.sql import SparkSession
import pytest
import shutil

@pytest.fixture
def spark_session():
return SparkSession.builder \
.master('local[1]') \
.appName('SparkByExamples.com') \
.getOrCreate()


def test_create_table(spark_session):
df = spark_session.createDataFrame([['one', 'two']]).toDF(*['first', 'second'])
print(df.show())

df2 = spark_session.createDataFrame([['one', 'two']]).toDF(*['first', 'second'])

assert df.subtract(df2).count() == 0



On Thu, Nov 19, 2020 at 6:38 AM Sachit Murarka <[hidden email]> wrote:
Hi Users,

I have to write Unit Test cases for PySpark. 
I think pytest-spark and "spark testing base" are good test libraries.

Can anyone please provide full reference for writing the test cases in Python using these?

Kind Regards,
Sachit Murarka