What are all the things to Monitor to keep the spark jobs from failure

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

What are all the things to Monitor to keep the spark jobs from failure

Akhil Das
Hi

I have a spark project running on 4 Core 16GB (both master/worker) instance, now can anyone tell me what are all the things to keep monitoring so that my cluster/jobs will never go down?

I have created a small list which includes the following items, please extend the list if you know more:

1. Monitor Spark Master/Worker from failing
2. Monitor HDFS from getting filled/going down
3. Monitor network connectivity for master/worker
4. Monitor Spark Jobs from getting killed


--
Thanks
Best Regards
Reply | Threaded
Open this post in threaded view
|

Re: What are all the things to Monitor to keep the spark jobs from failure

Mayur Rustagi
Thr is some discussion around things to monitor, i dont think its automated yet.. 
Also some discussion here of how to use Cloudwatch with spark:

One comprehensive doc here would be really helpful, especially for streaming where latency is important.



On Fri, Feb 14, 2014 at 8:41 AM, Akhil Das <[hidden email]> wrote:
Hi

I have a spark project running on 4 Core 16GB (both master/worker) instance, now can anyone tell me what are all the things to keep monitoring so that my cluster/jobs will never go down?

I have created a small list which includes the following items, please extend the list if you know more:

1. Monitor Spark Master/Worker from failing
2. Monitor HDFS from getting filled/going down
3. Monitor network connectivity for master/worker
4. Monitor Spark Jobs from getting killed


--
Thanks
Best Regards

Reply | Threaded
Open this post in threaded view
|

Re: What are all the things to Monitor to keep the spark jobs from failure

Akhil Das
Thank you Mayur.


On Tue, Feb 18, 2014 at 10:33 AM, Mayur Rustagi <[hidden email]> wrote:
Thr is some discussion around things to monitor, i dont think its automated yet.. 
Also some discussion here of how to use Cloudwatch with spark:

One comprehensive doc here would be really helpful, especially for streaming where latency is important.

Mayur Rustagi
Ph: <a href="tel:%2B919632149971" value="+919632149971" target="_blank">+919632149971


On Fri, Feb 14, 2014 at 8:41 AM, Akhil Das <[hidden email]> wrote:
Hi

I have a spark project running on 4 Core 16GB (both master/worker) instance, now can anyone tell me what are all the things to keep monitoring so that my cluster/jobs will never go down?

I have created a small list which includes the following items, please extend the list if you know more:

1. Monitor Spark Master/Worker from failing
2. Monitor HDFS from getting filled/going down
3. Monitor network connectivity for master/worker
4. Monitor Spark Jobs from getting killed


--
Thanks
Best Regards




--
Thanks
Best Regards