Spark Structured streaming - Kakfa - slowness with query 0

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark Structured streaming - Kakfa - slowness with query 0

khajaasmath786
Hi,

I have started using spark structured streaming for reading data from kaka and the job is very slow. Number of output rows keeps increasing in query 0 and the job is running forever. any suggestions for this please? 

image.png

Thanks,
Asmath
Reply | Threaded
Open this post in threaded view
|

Re: Spark Structured streaming - Kakfa - slowness with query 0

Lalwani, Jayesh

Are you getting any output? Streaming jobs typically run forever, and keep processing data as it comes in the input. If a streaming job is working well, it will typically generate output at a certain cadence

 

From: KhajaAsmath Mohammed <[hidden email]>
Date: Tuesday, October 20, 2020 at 1:23 PM
To: "user @spark" <[hidden email]>
Subject: [EXTERNAL] Spark Structured streaming - Kakfa - slowness with query 0

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

 

Hi,

 

I have started using spark structured streaming for reading data from kaka and the job is very slow. Number of output rows keeps increasing in query 0 and the job is running forever. any suggestions for this please? 

 

 

Thanks,

Asmath

Reply | Threaded
Open this post in threaded view
|

Re: Spark Structured streaming - Kakfa - slowness with query 0

lec ssmi
Do you start your application  with  chasing the early Kafka data  ? 

Lalwani, Jayesh <[hidden email]> 于2020年10月21日周三 上午2:19写道:

Are you getting any output? Streaming jobs typically run forever, and keep processing data as it comes in the input. If a streaming job is working well, it will typically generate output at a certain cadence

 

From: KhajaAsmath Mohammed <[hidden email]>
Date: Tuesday, October 20, 2020 at 1:23 PM
To: "user @spark" <[hidden email]>
Subject: [EXTERNAL] Spark Structured streaming - Kakfa - slowness with query 0

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

 

Hi,

 

I have started using spark structured streaming for reading data from kaka and the job is very slow. Number of output rows keeps increasing in query 0 and the job is running forever. any suggestions for this please? 

 

 

Thanks,

Asmath

Reply | Threaded
Open this post in threaded view
|

Re: Spark Structured streaming - Kakfa - slowness with query 0

khajaasmath786
Yes. Changing back to latest worked but I still see the slowness compared to flume. 

Sent from my iPhone

On Oct 20, 2020, at 10:21 PM, lec ssmi <[hidden email]> wrote:


Do you start your application  with  chasing the early Kafka data  ? 

Lalwani, Jayesh <[hidden email]> 于2020年10月21日周三 上午2:19写道:

Are you getting any output? Streaming jobs typically run forever, and keep processing data as it comes in the input. If a streaming job is working well, it will typically generate output at a certain cadence

 

From: KhajaAsmath Mohammed <[hidden email]>
Date: Tuesday, October 20, 2020 at 1:23 PM
To: "user @spark" <[hidden email]>
Subject: [EXTERNAL] Spark Structured streaming - Kakfa - slowness with query 0

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

 

Hi,

 

I have started using spark structured streaming for reading data from kaka and the job is very slow. Number of output rows keeps increasing in query 0 and the job is running forever. any suggestions for this please? 

 

<image001.png>

 

Thanks,

Asmath

Reply | Threaded
Open this post in threaded view
|

Re: Spark Structured streaming - Kakfa - slowness with query 0

lec ssmi
    Structured streaming's  bottom layer also uses a micro-batch mechanism. It seems that the first batch is slower than  the latter, I also often encounter this problem. It feels related to the division of batches. 
   Other the other hand, spark's batch size is usually bigger than flume transaction bache size. 


KhajaAsmath Mohammed <[hidden email]> 于2020年10月21日周三 下午12:19写道:
Yes. Changing back to latest worked but I still see the slowness compared to flume. 

Sent from my iPhone

On Oct 20, 2020, at 10:21 PM, lec ssmi <[hidden email]> wrote:


Do you start your application  with  chasing the early Kafka data  ? 

Lalwani, Jayesh <[hidden email]> 于2020年10月21日周三 上午2:19写道:

Are you getting any output? Streaming jobs typically run forever, and keep processing data as it comes in the input. If a streaming job is working well, it will typically generate output at a certain cadence

 

From: KhajaAsmath Mohammed <[hidden email]>
Date: Tuesday, October 20, 2020 at 1:23 PM
To: "user @spark" <[hidden email]>
Subject: [EXTERNAL] Spark Structured streaming - Kakfa - slowness with query 0

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

 

Hi,

 

I have started using spark structured streaming for reading data from kaka and the job is very slow. Number of output rows keeps increasing in query 0 and the job is running forever. any suggestions for this please? 

 

<image001.png>

 

Thanks,

Asmath

Reply | Threaded
Open this post in threaded view
|

Re: Spark Structured streaming - Kakfa - slowness with query 0

khajaasmath786
Thanks. Do we have option to limit number of records ? Like process only 10000 or the property we pass ? This way we can handle the amount of the data for batches that we need . 

Sent from my iPhone

On Oct 21, 2020, at 12:11 AM, lec ssmi <[hidden email]> wrote:


    Structured streaming's  bottom layer also uses a micro-batch mechanism. It seems that the first batch is slower than  the latter, I also often encounter this problem. It feels related to the division of batches. 
   Other the other hand, spark's batch size is usually bigger than flume transaction bache size. 


KhajaAsmath Mohammed <[hidden email]> 于2020年10月21日周三 下午12:19写道:
Yes. Changing back to latest worked but I still see the slowness compared to flume. 

Sent from my iPhone

On Oct 20, 2020, at 10:21 PM, lec ssmi <[hidden email]> wrote:


Do you start your application  with  chasing the early Kafka data  ? 

Lalwani, Jayesh <[hidden email]> 于2020年10月21日周三 上午2:19写道:

Are you getting any output? Streaming jobs typically run forever, and keep processing data as it comes in the input. If a streaming job is working well, it will typically generate output at a certain cadence

 

From: KhajaAsmath Mohammed <[hidden email]>
Date: Tuesday, October 20, 2020 at 1:23 PM
To: "user @spark" <[hidden email]>
Subject: [EXTERNAL] Spark Structured streaming - Kakfa - slowness with query 0

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

 

Hi,

 

I have started using spark structured streaming for reading data from kaka and the job is very slow. Number of output rows keeps increasing in query 0 and the job is running forever. any suggestions for this please? 

 

<image001.png>

 

Thanks,

Asmath

Reply | Threaded
Open this post in threaded view
|

Re: Spark Structured streaming - Kakfa - slowness with query 0

"Yuri Oleynikov (‫יורי אולייניקוב‬‎)"
I think MaxOffsetsPerTrigger in Spark + Kafka integration docs would meet your requirement

Отправлено с iPhone

21 окт. 2020 г., в 12:36, KhajaAsmath Mohammed <[hidden email]> написал(а):

Thanks. Do we have option to limit number of records ? Like process only 10000 or the property we pass ? This way we can handle the amount of the data for batches that we need . 

Sent from my iPhone

On Oct 21, 2020, at 12:11 AM, lec ssmi <[hidden email]> wrote:


    Structured streaming's  bottom layer also uses a micro-batch mechanism. It seems that the first batch is slower than  the latter, I also often encounter this problem. It feels related to the division of batches. 
   Other the other hand, spark's batch size is usually bigger than flume transaction bache size. 


KhajaAsmath Mohammed <[hidden email]> 于2020年10月21日周三 下午12:19写道:
Yes. Changing back to latest worked but I still see the slowness compared to flume. 

Sent from my iPhone

On Oct 20, 2020, at 10:21 PM, lec ssmi <[hidden email]> wrote:


Do you start your application  with  chasing the early Kafka data  ? 

Lalwani, Jayesh <[hidden email]> 于2020年10月21日周三 上午2:19写道:

Are you getting any output? Streaming jobs typically run forever, and keep processing data as it comes in the input. If a streaming job is working well, it will typically generate output at a certain cadence

 

From: KhajaAsmath Mohammed <[hidden email]>
Date: Tuesday, October 20, 2020 at 1:23 PM
To: "user @spark" <[hidden email]>
Subject: [EXTERNAL] Spark Structured streaming - Kakfa - slowness with query 0

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

 

Hi,

 

I have started using spark structured streaming for reading data from kaka and the job is very slow. Number of output rows keeps increasing in query 0 and the job is running forever. any suggestions for this please? 

 

<image001.png>

 

Thanks,

Asmath