SparkLauncher reliability and scalability

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

SparkLauncher reliability and scalability

mhd wrk
We are using SparkLauncher and SparkAppHandle.Listener to launch spark applications from a Java web application and listen to the state changes. Our observation is that as the number of concurrent jobs grow sometimes some of the state changes are not reported (e.g. some applications never report final state even when the corresponding spark job in YARN UI is marked FINISHED). I'm wondering if there are any guidelines/limits on launching (potentially large number of long running), concurrent spark jobs?

Thanks,
Reply | Threaded
Open this post in threaded view
|

Re: SparkLauncher reliability and scalability

Proust (Feng Guizhou) [Travel Search & Discovery]
How about Apache Livy? the purpose is similar as SparkLauncher, but through Restful API to launch Spark Jobs


From: mhd wrk <[hidden email]>
Sent: Monday, April 27, 2020 11:38 PM
To: [hidden email] <[hidden email]>
Subject: SparkLauncher reliability and scalability
 
[Warning]: This email originated from an external source. Do not open links or attachments unless you know the content is safe.
[경고]: 본 이메일은 회사 외부에서 유입되었습니다. 내용이 안전한지 확인하기 전까지는 링크나 첨부파일을 열지 마십시오.

We are using SparkLauncher and SparkAppHandle.Listener to launch spark applications from a Java web application and listen to the state changes. Our observation is that as the number of concurrent jobs grow sometimes some of the state changes are not reported (e.g. some applications never report final state even when the corresponding spark job in YARN UI is marked FINISHED). I'm wondering if there are any guidelines/limits on launching (potentially large number of long running), concurrent spark jobs?

Thanks,