How to programmatically pause and resume Spark/Kafka structured streaming?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How to programmatically pause and resume Spark/Kafka structured streaming?

kant kodali
Hi All,

I am trying to see if there is a way to pause a spark stream that process data from Kafka such that my application can take some actions while the stream is paused and resume when the application completes those actions.

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: How to programmatically pause and resume Spark/Kafka structured streaming?

Gourav Sengupta
Hi,

exactly my question, I was also looking for ways to gracefully exit spark structured streaming.


Regards,
Gourav

On Tue, Aug 6, 2019 at 3:43 AM kant kodali <[hidden email]> wrote:
Hi All,

I am trying to see if there is a way to pause a spark stream that process data from Kafka such that my application can take some actions while the stream is paused and resume when the application completes those actions.

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: How to programmatically pause and resume Spark/Kafka structured streaming?

kant kodali
If I stop and start while processing the batch what will happen? will that batch gets canceled and gets reprocessed again when I click start? Does that mean I need to worry about duplicates in the downstream? Kafka consumers have a pause and resume and they work just fine so I am not sure why Spark doesn't expose that.


On Mon, Aug 5, 2019 at 10:54 PM Gourav Sengupta <[hidden email]> wrote:
Hi,

exactly my question, I was also looking for ways to gracefully exit spark structured streaming.


Regards,
Gourav

On Tue, Aug 6, 2019 at 3:43 AM kant kodali <[hidden email]> wrote:
Hi All,

I am trying to see if there is a way to pause a spark stream that process data from Kafka such that my application can take some actions while the stream is paused and resume when the application completes those actions.

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: How to programmatically pause and resume Spark/Kafka structured streaming?

Gourav Sengupta
HiĀ 
There is a method to iterate only once in Spark. I use it for reading files using streaming. May be you can try that.
Regards,
Gourav

On Tue, 6 Aug 2019, 21:50 kant kodali, <[hidden email]> wrote:
If I stop and start while processing the batch what will happen? will that batch gets canceled and gets reprocessed again when I click start? Does that mean I need to worry about duplicates in the downstream? Kafka consumers have a pause and resume and they work just fine so I am not sure why Spark doesn't expose that.


On Mon, Aug 5, 2019 at 10:54 PM Gourav Sengupta <[hidden email]> wrote:
Hi,

exactly my question, I was also looking for ways to gracefully exit spark structured streaming.


Regards,
Gourav

On Tue, Aug 6, 2019 at 3:43 AM kant kodali <[hidden email]> wrote:
Hi All,

I am trying to see if there is a way to pause a spark stream that process data from Kafka such that my application can take some actions while the stream is paused and resume when the application completes those actions.

Thanks!