Integration testing Framework Spark SQL Scala

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Integration testing Framework Spark SQL Scala

Ruijing Li
Hi all,

I’m interested in hearing the community’s thoughts on best practices to do integration testing for spark sql jobs. We run a lot of our jobs with cloud infrastructure and hdfs - this makes debugging a challenge for us, especially with problems that don’t occur from just initializing a sparksession locally or testing with spark-shell. Ideally, we’d like some sort of docker container emulating hdfs and spark cluster mode, that you can run locally. 

Any test framework, tips, or examples people can share? Thanks!
--
Cheers,
Ruijing Li
Reply | Threaded
Open this post in threaded view
|

Re: Integration testing Framework Spark SQL Scala

Ruijing Li
Just wanted to follow up on this. If anyone has any advice, I’d be interested in learning more!

On Thu, Feb 20, 2020 at 6:09 PM Ruijing Li <[hidden email]> wrote:
Hi all,

I’m interested in hearing the community’s thoughts on best practices to do integration testing for spark sql jobs. We run a lot of our jobs with cloud infrastructure and hdfs - this makes debugging a challenge for us, especially with problems that don’t occur from just initializing a sparksession locally or testing with spark-shell. Ideally, we’d like some sort of docker container emulating hdfs and spark cluster mode, that you can run locally. 

Any test framework, tips, or examples people can share? Thanks!
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
Reply | Threaded
Open this post in threaded view
|

Re: Integration testing Framework Spark SQL Scala

Lars Albertsson
Hi,

Sorry for the very slow reply - I am far behind in my mailing list
subscriptions.

You'll find a few slides covering the topic in this presentation:
https://www.slideshare.net/lallea/test-strategies-for-data-processing-pipelines-67244458

Video here: https://vimeo.com/192429554

Regards,

Lars Albertsson
Data engineering entrepreneur
www.scling.com, www.mapflat.com
https://twitter.com/lalleal
+46 70 7687109

On Tue, Feb 25, 2020 at 7:46 PM Ruijing Li <[hidden email]> wrote:

>
> Just wanted to follow up on this. If anyone has any advice, I’d be interested in learning more!
>
> On Thu, Feb 20, 2020 at 6:09 PM Ruijing Li <[hidden email]> wrote:
>>
>> Hi all,
>>
>> I’m interested in hearing the community’s thoughts on best practices to do integration testing for spark sql jobs. We run a lot of our jobs with cloud infrastructure and hdfs - this makes debugging a challenge for us, especially with problems that don’t occur from just initializing a sparksession locally or testing with spark-shell. Ideally, we’d like some sort of docker container emulating hdfs and spark cluster mode, that you can run locally.
>>
>> Any test framework, tips, or examples people can share? Thanks!
>> --
>> Cheers,
>> Ruijing Li
>
> --
> Cheers,
> Ruijing Li

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]