How to sleep Spark job

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

How to sleep Spark job

Soheil Pourbafrani
Hi,

I want to submit a job in YARN cluster to read data from Cassandra and write them in HDFS, every hour, for example.

Is it possible to make Spark Application sleep in a while true loop and awake every hour to process data?
Reply | Threaded
Open this post in threaded view
|

Re: How to sleep Spark job

kevin.r.mellott
I’d recommend using a scheduler of some kind to trigger your job each hour, and have the Spark job exit when it completes. Spark is not meant to run in any type of “sleep mode”, unless you want to run a structured streaming job and create a separate process to pull data from Casandra and publish it to your streaming endpoint. That decision really depends more on your use case.

On Tue, Jan 22, 2019 at 11:56 PM Soheil Pourbafrani <[hidden email]> wrote:
Hi,

I want to submit a job in YARN cluster to read data from Cassandra and write them in HDFS, every hour, for example.

Is it possible to make Spark Application sleep in a while true loop and awake every hour to process data?
Reply | Threaded
Open this post in threaded view
|

Re: How to sleep Spark job

Moein Hosseini
In reply to this post by Soheil Pourbafrani
Hi Soheil,

Yes, It's possible to force your application to sleep after Job
do {
   // Your spark job goes here
   Thread.sleep(3600000);
} while(true);

But maybe AirFlow is better option if you need scheduler on your Spark Job.


On Wed, Jan 23, 2019 at 9:26 AM Soheil Pourbafrani <[hidden email]> wrote:
Hi,

I want to submit a job in YARN cluster to read data from Cassandra and write them in HDFS, every hour, for example.

Is it possible to make Spark Application sleep in a while true loop and awake every hour to process data?


--

Moein Hosseini
Data Engineer
mobile: <a href="tel:+98+912+468+1859" style="background-color:transparent;color:rgb(0,0,1)" target="_blank">+98 912 468 1859
site: www.moein.xyz
email: [hidden email]
linkedin
twitter

Reply | Threaded
Open this post in threaded view
|

Re: How to sleep Spark job

Moein Hosseini
In reply to this post by kevin.r.mellott
In this manner, your application should create distinct jobs each time. So for the first time you driver create DAG and do it with help of executors, then finish the job and goes to sleep( Driver/Application ). When it wakes up, it will create new Job and DAG and ... 
Some how same as create cron-job to submit your single application to cluster every time.

On Wed, Jan 23, 2019 at 10:04 AM Kevin Mellott <[hidden email]> wrote:
I’d recommend using a scheduler of some kind to trigger your job each hour, and have the Spark job exit when it completes. Spark is not meant to run in any type of “sleep mode”, unless you want to run a structured streaming job and create a separate process to pull data from Casandra and publish it to your streaming endpoint. That decision really depends more on your use case.

On Tue, Jan 22, 2019 at 11:56 PM Soheil Pourbafrani <[hidden email]> wrote:
Hi,

I want to submit a job in YARN cluster to read data from Cassandra and write them in HDFS, every hour, for example.

Is it possible to make Spark Application sleep in a while true loop and awake every hour to process data?


--

Moein Hosseini
Data Engineer
mobile: <a href="tel:+98+912+468+1859" style="background-color:transparent;color:rgb(0,0,1)" target="_blank">+98 912 468 1859
site: www.moein.xyz
email: [hidden email]
linkedin
twitter

Reply | Threaded
Open this post in threaded view
|

Re: How to sleep Spark job

Soheil Pourbafrani
Thanks for the tip! 

On Wed, Jan 23, 2019 at 10:28 AM Moein Hosseini <[hidden email]> wrote:
In this manner, your application should create distinct jobs each time. So for the first time you driver create DAG and do it with help of executors, then finish the job and goes to sleep( Driver/Application ). When it wakes up, it will create new Job and DAG and ... 
Some how same as create cron-job to submit your single application to cluster every time.

On Wed, Jan 23, 2019 at 10:04 AM Kevin Mellott <[hidden email]> wrote:
I’d recommend using a scheduler of some kind to trigger your job each hour, and have the Spark job exit when it completes. Spark is not meant to run in any type of “sleep mode”, unless you want to run a structured streaming job and create a separate process to pull data from Casandra and publish it to your streaming endpoint. That decision really depends more on your use case.

On Tue, Jan 22, 2019 at 11:56 PM Soheil Pourbafrani <[hidden email]> wrote:
Hi,

I want to submit a job in YARN cluster to read data from Cassandra and write them in HDFS, every hour, for example.

Is it possible to make Spark Application sleep in a while true loop and awake every hour to process data?


--

Moein Hosseini
Data Engineer
mobile: <a href="tel:+98+912+468+1859" style="background-color:transparent;color:rgb(0,0,1)" target="_blank">+98 912 468 1859
site: www.moein.xyz
email: [hidden email]
linkedin
twitter