Testing Apache Spark applications

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Testing Apache Spark applications

Omer.Ozsakarya

Hi all,

 

How are you testing your Spark applications?

We are writing features by using Cucumber. This is testing the behaviours. Is this called functional test or integration test?

 

We are also planning to write unit tests.

 

For instance we have a class like below. It has one method. This methos is implementing several things: like DataFrame operations, saving DataFrame into database table, insert, update,delete statements.

 

Our classes generally contains 2 or 3 methods. These methods cover a lot of tasks in the same function defintion. (like the function below)

So I am not sure how I can write unit tests for these classes and methods.

Do you have any suggestion?



class CustomerOperations

 

   def doJob(inputDataFrame : DataFrame) = {

           // definitions (value/variable)

           // spark context, session etc definition

 

          //  filtering, cleansing on inputDataframe and save results on a new dataframe

          // insert new dataframe to a database table

         //  several insert/update/delete statements on the database tables

 

    }

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Testing Apache Spark applications

☼ R Nair (रविशंकर नायर)
Sparklens from qubole is a good source. Other tests are to be handled by developer.

Best, 
Ravi

On Thu, Nov 15, 2018, 12:45 PM <[hidden email] wrote:

Hi all,

 

How are you testing your Spark applications?

We are writing features by using Cucumber. This is testing the behaviours. Is this called functional test or integration test?

 

We are also planning to write unit tests.

 

For instance we have a class like below. It has one method. This methos is implementing several things: like DataFrame operations, saving DataFrame into database table, insert, update,delete statements.

 

Our classes generally contains 2 or 3 methods. These methods cover a lot of tasks in the same function defintion. (like the function below)

So I am not sure how I can write unit tests for these classes and methods.

Do you have any suggestion?



class CustomerOperations

 

   def doJob(inputDataFrame : DataFrame) = {

           // definitions (value/variable)

           // spark context, session etc definition

 

          //  filtering, cleansing on inputDataframe and save results on a new dataframe

          // insert new dataframe to a database table

         //  several insert/update/delete statements on the database tables

 

    }

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Testing Apache Spark applications

Vitaliy Pisarev

Hard to answer in a succinct manner but I'll give it a shot.

Cucumber is a tool for writing Behaviour Driven Tests (closely related to behaviour driven development, BDD).
It is not a mere technical approach to testing but a mindset, a way of work and a different (different, whether it is better is a matter of controversy) way to structure communication between product and R&D.

I will not elaborate more as there is plenty of material out there if you want to educate yourself. Just bear in mind that BDD is riddled with misconception. Most often than not I see people just using Cucumber, but not doing actual BDD. 

Regarding unit testing, I do not consider the code you showed to be a good candidate for unit testing. There is very little procedural logic there and there is a good chance that if you go about unit testing it you will end up with lots and lots of mocks overly bound to the implementation details of the suit under test , rendering the tests unmaintainable and brittle.

I would argue that unit tests are more appropriate for code that is algorithmic in nature, that has no or very little dependencies and where you have an absolute oracle of truth regrading your expectations from it.

I think that in your situation going for integration tests (on small scale data) and regression tests would give you the most ROI.






On Thu, Nov 15, 2018 at 8:43 PM ☼ R Nair <[hidden email]> wrote:
Sparklens from qubole is a good source. Other tests are to be handled by developer.

Best, 
Ravi

On Thu, Nov 15, 2018, 12:45 PM <[hidden email] wrote:

Hi all,

 

How are you testing your Spark applications?

We are writing features by using Cucumber. This is testing the behaviours. Is this called functional test or integration test?

 

We are also planning to write unit tests.

 

For instance we have a class like below. It has one method. This methos is implementing several things: like DataFrame operations, saving DataFrame into database table, insert, update,delete statements.

 

Our classes generally contains 2 or 3 methods. These methods cover a lot of tasks in the same function defintion. (like the function below)

So I am not sure how I can write unit tests for these classes and methods.

Do you have any suggestion?



class CustomerOperations

 

   def doJob(inputDataFrame : DataFrame) = {

           // definitions (value/variable)

           // spark context, session etc definition

 

          //  filtering, cleansing on inputDataframe and save results on a new dataframe

          // insert new dataframe to a database table

         //  several insert/update/delete statements on the database tables

 

    }

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Testing Apache Spark applications

Lars Albertsson
In reply to this post by Omer.Ozsakarya
My previous answers to this question can be found in the archives, along with some other responses:


I have made a couple of presentations on the subject. Slides and video 
are linked on this page: http://www.mapflat.com/presentations/

You can find more material in this list of resources: 

Happy testing! 

Regards, 


Lars Albertsson
Data engineering entrepreneur


On Thu, Nov 15, 2018 at 6:45 PM <[hidden email]> wrote:

Hi all,

 

How are you testing your Spark applications?

We are writing features by using Cucumber. This is testing the behaviours. Is this called functional test or integration test?

 

We are also planning to write unit tests.

 

For instance we have a class like below. It has one method. This methos is implementing several things: like DataFrame operations, saving DataFrame into database table, insert, update,delete statements.

 

Our classes generally contains 2 or 3 methods. These methods cover a lot of tasks in the same function defintion. (like the function below)

So I am not sure how I can write unit tests for these classes and methods.

Do you have any suggestion?



class CustomerOperations

 

   def doJob(inputDataFrame : DataFrame) = {

           // definitions (value/variable)

           // spark context, session etc definition

 

          //  filtering, cleansing on inputDataframe and save results on a new dataframe

          // insert new dataframe to a database table

         //  several insert/update/delete statements on the database tables

 

    }