Spark structured streaming - performance tuning

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark structured streaming - performance tuning

Srinivas V
Hello, 
Can someone point me to a good video or document which takes about performance tuning for structured streaming app? 
I am looking especially for listening to Kafka topics say 5 topics each with 100 portions .
Trying to figure out best cluster size and number of executors and cores required. 

Regards 
Srini
Reply | Threaded
Open this post in threaded view
|

Re: Spark structured streaming - performance tuning

Alex Ott
http://shop.oreilly.com/product/0636920047568.do has quite good information
on it.  For Kafka, you need to start with approximation that processing of
each partition is a separate task that need to be executed, so you need to
plan number of cores correspondingly.

Srinivas V  at "Thu, 16 Apr 2020 22:49:15 +0530" wrote:
 SV> Hello, 
 SV> Can someone point me to a good video or document which takes about performance tuning for structured streaming app? 
 SV> I am looking especially for listening to Kafka topics say 5 topics each with 100 portions .
 SV> Trying to figure out best cluster size and number of executors and cores required. 


--
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark structured streaming - performance tuning

Srinivas V
Thank you Alex. I will check it out and let you know if I have any questions

On Fri, Apr 17, 2020 at 11:36 PM Alex Ott <[hidden email]> wrote:
http://shop.oreilly.com/product/0636920047568.do has quite good information
on it.  For Kafka, you need to start with approximation that processing of
each partition is a separate task that need to be executed, so you need to
plan number of cores correspondingly.

Srinivas V  at "Thu, 16 Apr 2020 22:49:15 +0530" wrote:
 SV> Hello, 
 SV> Can someone point me to a good video or document which takes about performance tuning for structured streaming app? 
 SV> I am looking especially for listening to Kafka topics say 5 topics each with 100 portions .
 SV> Trying to figure out best cluster size and number of executors and cores required. 


--
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)
Reply | Threaded
Open this post in threaded view
|

Re: Spark structured streaming - performance tuning

Alex Ott
Just to clarify - I didn't write this explicitly in my answer. When you're
working with Kafka, every partition in Kafka is mapped into Spark
partition. And in Spark, every partition is mapped into task.   But you can
use `coalesce` to decrease the number of Spark partitions, so you'll have
less tasks...

Srinivas V  at "Sat, 18 Apr 2020 10:32:33 +0530" wrote:
 SV> Thank you Alex. I will check it out and let you know if I have any questions

 SV> On Fri, Apr 17, 2020 at 11:36 PM Alex Ott <[hidden email]> wrote:

 SV>     http://shop.oreilly.com/product/0636920047568.do has quite good information
 SV>     on it.  For Kafka, you need to start with approximation that processing of
 SV>     each partition is a separate task that need to be executed, so you need to
 SV>     plan number of cores correspondingly.
 SV>    
 SV>     Srinivas V  at "Thu, 16 Apr 2020 22:49:15 +0530" wrote:
 SV>      SV> Hello, 
 SV>      SV> Can someone point me to a good video or document which takes about performance tuning for structured streaming app? 
 SV>      SV> I am looking especially for listening to Kafka topics say 5 topics each with 100 portions .
 SV>      SV> Trying to figure out best cluster size and number of executors and cores required. 


--
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark structured streaming - performance tuning

Srinivas V
Hi Alex, read the book , it is a good one but i don’t see things which I strongly want to understand.
You are right on the partition and tasks. 
1.How to use coalesce with spark structured streaming ? 

Also I want to ask few more questions,
2. How to restrict number of executors on structured streaming?  —num-executors is minimum is it ? 
To cap max, can I use spark.dynamicAllocation.maxExecutors ? 

3. Does other streaming properties hold good for structured streaming? 
Like spark.streaming.dynamicAllocation.enabled ?
If not what are the ones it takes into consideration?

4. Does structured streaming 2.4.5 allow dynamicAllocation of executors/ cores? In case of Kafka consumer, when the cluster has to scale down, does it reconfigure the mapping of executors cores to kaka partitions?  

5. Why spark srtructured  Streaming web ui (SQL tab) is not so informative like streaming tab of Spark streaming ? 

It would be great if these questions are answered, otherwise the only option left would be to go through the spark code and figure out.

On Sat, Apr 18, 2020 at 1:09 PM Alex Ott <[hidden email]> wrote:
Just to clarify - I didn't write this explicitly in my answer. When you're
working with Kafka, every partition in Kafka is mapped into Spark
partition. And in Spark, every partition is mapped into task.   But you can
use `coalesce` to decrease the number of Spark partitions, so you'll have
less tasks...

Srinivas V  at "Sat, 18 Apr 2020 10:32:33 +0530" wrote:
 SV> Thank you Alex. I will check it out and let you know if I have any questions

 SV> On Fri, Apr 17, 2020 at 11:36 PM Alex Ott <[hidden email]> wrote:

 SV>     http://shop.oreilly.com/product/0636920047568.do has quite good information
 SV>     on it.  For Kafka, you need to start with approximation that processing of
 SV>     each partition is a separate task that need to be executed, so you need to
 SV>     plan number of cores correspondingly.
 SV>   
 SV>     Srinivas V  at "Thu, 16 Apr 2020 22:49:15 +0530" wrote:
 SV>      SV> Hello, 
 SV>      SV> Can someone point me to a good video or document which takes about performance tuning for structured streaming app? 
 SV>      SV> I am looking especially for listening to Kafka topics say 5 topics each with 100 portions .
 SV>      SV> Trying to figure out best cluster size and number of executors and cores required. 


--
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)
Reply | Threaded
Open this post in threaded view
|

Re: Spark structured streaming - performance tuning

Srinivas V
Anyone else can answer below questions on performance tuning Structured streaming?
@Jacek?

On Sun, May 3, 2020 at 12:07 AM Srinivas V <[hidden email]> wrote:
Hi Alex, read the book , it is a good one but i don’t see things which I strongly want to understand.
You are right on the partition and tasks. 
1.How to use coalesce with spark structured streaming ? 

Also I want to ask few more questions,
2. How to restrict number of executors on structured streaming?  —num-executors is minimum is it ? 
To cap max, can I use spark.dynamicAllocation.maxExecutors ? 

3. Does other streaming properties hold good for structured streaming? 
Like spark.streaming.dynamicAllocation.enabled ?
If not what are the ones it takes into consideration?

4. Does structured streaming 2.4.5 allow dynamicAllocation of executors/ cores? In case of Kafka consumer, when the cluster has to scale down, does it reconfigure the mapping of executors cores to kaka partitions?  

5. Why spark srtructured  Streaming web ui (SQL tab) is not so informative like streaming tab of Spark streaming ? 

It would be great if these questions are answered, otherwise the only option left would be to go through the spark code and figure out.

On Sat, Apr 18, 2020 at 1:09 PM Alex Ott <[hidden email]> wrote:
Just to clarify - I didn't write this explicitly in my answer. When you're
working with Kafka, every partition in Kafka is mapped into Spark
partition. And in Spark, every partition is mapped into task.   But you can
use `coalesce` to decrease the number of Spark partitions, so you'll have
less tasks...

Srinivas V  at "Sat, 18 Apr 2020 10:32:33 +0530" wrote:
 SV> Thank you Alex. I will check it out and let you know if I have any questions

 SV> On Fri, Apr 17, 2020 at 11:36 PM Alex Ott <[hidden email]> wrote:

 SV>     http://shop.oreilly.com/product/0636920047568.do has quite good information
 SV>     on it.  For Kafka, you need to start with approximation that processing of
 SV>     each partition is a separate task that need to be executed, so you need to
 SV>     plan number of cores correspondingly.
 SV>   
 SV>     Srinivas V  at "Thu, 16 Apr 2020 22:49:15 +0530" wrote:
 SV>      SV> Hello, 
 SV>      SV> Can someone point me to a good video or document which takes about performance tuning for structured streaming app? 
 SV>      SV> I am looking especially for listening to Kafka topics say 5 topics each with 100 portions .
 SV>      SV> Trying to figure out best cluster size and number of executors and cores required. 


--
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)