testing frameworks

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

testing frameworks

Steve Pruitt

Hi,

 

Can anyone recommend testing frameworks suitable for Spark jobs.  Something that can be integrated into a CI tool would be great.

 

Thanks.

 

Reply | Threaded
Open this post in threaded view
|

Re: testing frameworks

Holden Karau
So I’m biased as the author of spark-testing-base but I think it’s pretty ok. Are you looking for unit or integration or something else?

On Mon, May 21, 2018 at 5:24 AM Steve Pruitt <[hidden email]> wrote:

Hi,

 

Can anyone recommend testing frameworks suitable for Spark jobs.  Something that can be integrated into a CI tool would be great.

 

Thanks.

 

--
Reply | Threaded
Open this post in threaded view
|

RE: [EXTERNAL] - Re: testing frameworks

Steve Pruitt

Something more on the lines of integration I believe.  Run one or more Spark jobs and verify the output results.  If this makes sense.

 

I am very new to the world of Spark.  We want to include pipeline testing from the get go.  I will check out spark-testing-base.

 

 

Thanks.

 

From: Holden Karau [mailto:[hidden email]]
Sent: Monday, May 21, 2018 11:32 AM
To: Steve Pruitt <[hidden email]>
Cc: [hidden email]
Subject: [EXTERNAL] - Re: testing frameworks

 

So I’m biased as the author of spark-testing-base but I think it’s pretty ok. Are you looking for unit or integration or something else?

 

On Mon, May 21, 2018 at 5:24 AM Steve Pruitt <[hidden email]> wrote:

Hi,

 

Can anyone recommend testing frameworks suitable for Spark jobs.  Something that can be integrated into a CI tool would be great.

 

Thanks.

 

--

Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] - Re: testing frameworks

Joel D
We’ve developed our own version of testing framework consisting of different areas of checking, sometimes providing expected data and comparing with the resultant data from the data object.

Cheers.

On Tue, May 22, 2018 at 1:48 PM Steve Pruitt <[hidden email]> wrote:

Something more on the lines of integration I believe.  Run one or more Spark jobs and verify the output results.  If this makes sense.

 

I am very new to the world of Spark.  We want to include pipeline testing from the get go.  I will check out spark-testing-base.

 

 

Thanks.

 

From: Holden Karau [mailto:[hidden email]]
Sent: Monday, May 21, 2018 11:32 AM
To: Steve Pruitt <[hidden email]>
Cc: [hidden email]
Subject: [EXTERNAL] - Re: testing frameworks

 

So I’m biased as the author of spark-testing-base but I think it’s pretty ok. Are you looking for unit or integration or something else?

 

On Mon, May 21, 2018 at 5:24 AM Steve Pruitt <[hidden email]> wrote:

Hi,

 

Can anyone recommend testing frameworks suitable for Spark jobs.  Something that can be integrated into a CI tool would be great.

 

Thanks.

 

--

Reply | Threaded
Open this post in threaded view
|

Re: testing frameworks

umargeek
In reply to this post by Steve Pruitt
Hi Steve,

you can try out pytest-spark plugin if your writing programs using pyspark
,please find below link for reference.

https://github.com/malexer/pytest-spark
<https://github.com/malexer/pytest-spark>  

Thanks,
Umar



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: testing frameworks

spicoflorin
Hello!
  I'm also looking for unit testing spark Java application. I've seen the great work done in  spark-testing-base but it seemed to me that I could not use for Spark Java applications. 
Only spark scala applications are supported? 
Thanks.
Regards,
 Florin

On Wed, May 23, 2018 at 8:07 AM, umargeek <[hidden email]> wrote:
Hi Steve,

you can try out pytest-spark plugin if your writing programs using pyspark
,please find below link for reference.

https://github.com/malexer/pytest-spark
<https://github.com/malexer/pytest-spark

Thanks,
Umar



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: testing frameworks

Holden Karau
So Jessie has an excellent blog post on how to use it with Java applications - 

On Wed, May 30, 2018 at 4:14 AM Spico Florin <[hidden email]> wrote:
Hello!
  I'm also looking for unit testing spark Java application. I've seen the great work done in  spark-testing-base but it seemed to me that I could not use for Spark Java applications. 
Only spark scala applications are supported? 
Thanks.
Regards,
 Florin

On Wed, May 23, 2018 at 8:07 AM, umargeek <[hidden email]> wrote:
Hi Steve,

you can try out pytest-spark plugin if your writing programs using pyspark
,please find below link for reference.

https://github.com/malexer/pytest-spark
<https://github.com/malexer/pytest-spark

Thanks,
Umar



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


--
Reply | Threaded
Open this post in threaded view
|

Re: testing frameworks

spicoflorin
Hello!
  Thank you very much for your helpful answer and for the very good job performed in spark-testing-base . I managed to perform unit testing with spark-testing-base library as the provided article and also get inspired from


I had some concerns regarding on how to deal with compairing the RDDs that come from Dataframe and the one that come from jsc().parallelize method.
 
My workflow tests is as follow:
1. Get the data from a parquet file as dataframe
2. Convert dataframe  to toJavaRDD()
3. perform some mapping on the JavaRdd
4. Check whether the resulted mapped rdd  is equal with the expected one (retrieved from a text file)

I performed the above test with following code snippet 

SparkSession spark = SparkSession.builder().getOrCreate();
   
JavaRDD<Row> input =
        spark.read().parquet("src/test/resources/test_data.parquet").toJavaRDD();

JavaRDD<MyCustomer> result = MyDriver.convertToMyCustomerData(input);
 JavaRDDComparisons.assertRDDEquals(expected, result);

The above tests failed failed, even through the data is the same. By debugging the code, I observed that the data from that came from the DataFrame didn't have the same order as the one that came from jsc().parallelize(text_file).

So, I suppose that the issue came from the fact that the SparkSession and jsc() don't share the same SparkContext (there is a warning about this when running the program).

Therefore I came to the solution, to use the same jsc for both of the expected and the result. With this solution the assertion succeeded as expected.

  List<Row> df =spark.read().parquet("src/test/resources/test_data.parquet").toJavaRDD().collect();
    JavaRDD<Row> input = jsc().parallelize(df);

JavaRDD<MyCustomer> result = MyDriver.convertToMyCustomerData(input);
 JavaRDDComparisons.assertRDDEquals(expected, result);


My questions are:
1. what is the best solution to deal with RDDs comparison  when the RDDs are built from Dataframes and when they are tested with RDDs obtained via jsc().parallelize()?
2. Is the above solution a suitable one?

I look forward for your answers.

Regards,
  Florin
   






On Wed, May 30, 2018 at 3:11 PM, Holden Karau <[hidden email]> wrote:
So Jessie has an excellent blog post on how to use it with Java applications - 

On Wed, May 30, 2018 at 4:14 AM Spico Florin <[hidden email]> wrote:
Hello!
  I'm also looking for unit testing spark Java application. I've seen the great work done in  spark-testing-base but it seemed to me that I could not use for Spark Java applications. 
Only spark scala applications are supported? 
Thanks.
Regards,
 Florin

On Wed, May 23, 2018 at 8:07 AM, umargeek <[hidden email]> wrote:
Hi Steve,

you can try out pytest-spark plugin if your writing programs using pyspark
,please find below link for reference.

https://github.com/malexer/pytest-spark
<https://github.com/malexer/pytest-spark

Thanks,
Umar



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


--

Reply | Threaded
Open this post in threaded view
|

Re: testing frameworks

Lars Albertsson
In reply to this post by Steve Pruitt
Hi,

I wrote this answer to the same question a couple of years ago:
https://www.mail-archive.com/user%40spark.apache.org/msg48032.html

I have made a couple of presentations on the subject. Slides and video
are linked on this page: http://www.mapflat.com/presentations/

You can find more material in this list of resources:
http://www.mapflat.com/lands/resources/reading-list

Happy testing!

Regards,



Lars Albertsson
Data engineering consultant
www.mapflat.com
https://twitter.com/lalleal
+46 70 7687109
Calendar: http://www.mapflat.com/calendar


On Mon, May 21, 2018 at 2:24 PM, Steve Pruitt <[hidden email]> wrote:

> Hi,
>
>
>
> Can anyone recommend testing frameworks suitable for Spark jobs.  Something
> that can be integrated into a CI tool would be great.
>
>
>
> Thanks.
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: testing frameworks

Ryan Adams
We use spark testing base for unit testing.  These tests execute on a very small amount of data that covers all paths the code can take (or most paths anyway).


For integration testing we use automated routines to ensure that aggregate values match an aggregate baseline.

Ryan

Ryan Adams
[hidden email]

On Tue, Jun 12, 2018 at 11:51 AM, Lars Albertsson <[hidden email]> wrote:
Hi,

I wrote this answer to the same question a couple of years ago:
https://www.mail-archive.com/user%40spark.apache.org/msg48032.html

I have made a couple of presentations on the subject. Slides and video
are linked on this page: http://www.mapflat.com/presentations/

You can find more material in this list of resources:
http://www.mapflat.com/lands/resources/reading-list

Happy testing!

Regards,



Lars Albertsson
Data engineering consultant
www.mapflat.com
https://twitter.com/lalleal
+46 70 7687109
Calendar: http://www.mapflat.com/calendar


On Mon, May 21, 2018 at 2:24 PM, Steve Pruitt <[hidden email]> wrote:
> Hi,
>
>
>
> Can anyone recommend testing frameworks suitable for Spark jobs.  Something
> that can be integrated into a CI tool would be great.
>
>
>
> Thanks.
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]