Spark structured streaming -Kafka - deployment / monitor and restart

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark structured streaming -Kafka - deployment / monitor and restart

khajaasmath786
Hi,

We are trying to move our existing code from spark dstreams to structured streaming for one of the old application which we built few years ago.

Structured streaming job doesn’t have streaming tab in sparkui. Is there a way to monitor the job submitted by us in structured streaming ? Since the job runs for every trigger, how can we kill the job and restart if needed.

Any suggestions on this please

Thanks,
Asmath



---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark structured streaming -Kafka - deployment / monitor and restart

Gabor Somogyi
In 3.0 the community just added it.

On Sun, 5 Jul 2020, 14:28 KhajaAsmath Mohammed, <[hidden email]> wrote:
Hi,

We are trying to move our existing code from spark dstreams to structured streaming for one of the old application which we built few years ago.

Structured streaming job doesn’t have streaming tab in sparkui. Is there a way to monitor the job submitted by us in structured streaming ? Since the job runs for every trigger, how can we kill the job and restart if needed.

Any suggestions on this please

Thanks,
Asmath



---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark structured streaming -Kafka - deployment / monitor and restart

Jungtaek Lim-2
There're sections in SS programming guide which exactly answer these questions:


Also, for Kafka data source, there's a 3rd party project (DISCLAIMER: I'm the author) to help you commit the offset to Kafka with the specific group ID.


After then, you can also leverage the Kafka ecosystem to monitor the progress in point of Kafka's view, especially the gap between highest offset and committed offset.

Hope this helps.

Thanks,
Jungtaek Lim (HeartSaVioR)


On Mon, Jul 6, 2020 at 2:53 AM Gabor Somogyi <[hidden email]> wrote:
In 3.0 the community just added it.

On Sun, 5 Jul 2020, 14:28 KhajaAsmath Mohammed, <[hidden email]> wrote:
Hi,

We are trying to move our existing code from spark dstreams to structured streaming for one of the old application which we built few years ago.

Structured streaming job doesn’t have streaming tab in sparkui. Is there a way to monitor the job submitted by us in structured streaming ? Since the job runs for every trigger, how can we kill the job and restart if needed.

Any suggestions on this please

Thanks,
Asmath



---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark structured streaming -Kafka - deployment / monitor and restart

khajaasmath786
Thanks Lim, this is really helpful. I have few questions.

Our earlier approach used low level customer to read offsets from database and use those information to read using spark streaming in Dstreams. Save the offsets back once the process is finished. This way we never lost data.

with your library, will it automatically process from the last offset it processed when the application was stopped or killed for some time. 

Thanks,
Asmath

On Sun, Jul 5, 2020 at 6:22 PM Jungtaek Lim <[hidden email]> wrote:
There're sections in SS programming guide which exactly answer these questions:


Also, for Kafka data source, there's a 3rd party project (DISCLAIMER: I'm the author) to help you commit the offset to Kafka with the specific group ID.


After then, you can also leverage the Kafka ecosystem to monitor the progress in point of Kafka's view, especially the gap between highest offset and committed offset.

Hope this helps.

Thanks,
Jungtaek Lim (HeartSaVioR)


On Mon, Jul 6, 2020 at 2:53 AM Gabor Somogyi <[hidden email]> wrote:
In 3.0 the community just added it.

On Sun, 5 Jul 2020, 14:28 KhajaAsmath Mohammed, <[hidden email]> wrote:
Hi,

We are trying to move our existing code from spark dstreams to structured streaming for one of the old application which we built few years ago.

Structured streaming job doesn’t have streaming tab in sparkui. Is there a way to monitor the job submitted by us in structured streaming ? Since the job runs for every trigger, how can we kill the job and restart if needed.

Any suggestions on this please

Thanks,
Asmath



---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark structured streaming -Kafka - deployment / monitor and restart

Jungtaek Lim-2
In SS, checkpointing is now a part of running micro-batch and it's supported natively. (making clear, my library doesn't deal with the native behavior of checkpointing)

In other words, it can't be customized like you have been doing with your database. You probably don't need to do it with SS, but it still depends on what you did with the offsets in the database.

On Tue, Jul 7, 2020 at 1:40 AM KhajaAsmath Mohammed <[hidden email]> wrote:
Thanks Lim, this is really helpful. I have few questions.

Our earlier approach used low level customer to read offsets from database and use those information to read using spark streaming in Dstreams. Save the offsets back once the process is finished. This way we never lost data.

with your library, will it automatically process from the last offset it processed when the application was stopped or killed for some time. 

Thanks,
Asmath

On Sun, Jul 5, 2020 at 6:22 PM Jungtaek Lim <[hidden email]> wrote:
There're sections in SS programming guide which exactly answer these questions:


Also, for Kafka data source, there's a 3rd party project (DISCLAIMER: I'm the author) to help you commit the offset to Kafka with the specific group ID.


After then, you can also leverage the Kafka ecosystem to monitor the progress in point of Kafka's view, especially the gap between highest offset and committed offset.

Hope this helps.

Thanks,
Jungtaek Lim (HeartSaVioR)


On Mon, Jul 6, 2020 at 2:53 AM Gabor Somogyi <[hidden email]> wrote:
In 3.0 the community just added it.

On Sun, 5 Jul 2020, 14:28 KhajaAsmath Mohammed, <[hidden email]> wrote:
Hi,

We are trying to move our existing code from spark dstreams to structured streaming for one of the old application which we built few years ago.

Structured streaming job doesn’t have streaming tab in sparkui. Is there a way to monitor the job submitted by us in structured streaming ? Since the job runs for every trigger, how can we kill the job and restart if needed.

Any suggestions on this please

Thanks,
Asmath



---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]