Submit many spark applications

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Submit many spark applications

Shiyuan
Hi Spark-users, 
 I want to submit as many spark applications as the resources permit. I am using cluster mode on a yarn cluster.  Yarn can queue and launch these applications without problems. The problem lies on spark-submit itself. Spark-submit starts a jvm which could fail due to insufficient memory on the machine where I run spark-submit if many spark-submit jvm are running. Any suggestions on how to solve this problem? Thank you!
Reply | Threaded
Open this post in threaded view
|

Re: Submit many spark applications

Marcelo Vanzin
You can either:

- set spark.yarn.submit.waitAppCompletion=false, which will make
spark-submit go away once the app starts in cluster mode.
- use the (new in 2.3) InProcessLauncher class + some custom Java code
to submit all the apps from the same "launcher" process.

On Wed, May 16, 2018 at 1:45 PM, Shiyuan <[hidden email]> wrote:
> Hi Spark-users,
>  I want to submit as many spark applications as the resources permit. I am
> using cluster mode on a yarn cluster.  Yarn can queue and launch these
> applications without problems. The problem lies on spark-submit itself.
> Spark-submit starts a jvm which could fail due to insufficient memory on the
> machine where I run spark-submit if many spark-submit jvm are running. Any
> suggestions on how to solve this problem? Thank you!



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Submit many spark applications

ayan guha
How about using Livy to submit jobs? 

On Thu, 17 May 2018 at 7:24 am, Marcelo Vanzin <[hidden email]> wrote:
You can either:

- set spark.yarn.submit.waitAppCompletion=false, which will make
spark-submit go away once the app starts in cluster mode.
- use the (new in 2.3) InProcessLauncher class + some custom Java code
to submit all the apps from the same "launcher" process.

On Wed, May 16, 2018 at 1:45 PM, Shiyuan <[hidden email]> wrote:
> Hi Spark-users,
>  I want to submit as many spark applications as the resources permit. I am
> using cluster mode on a yarn cluster.  Yarn can queue and launch these
> applications without problems. The problem lies on spark-submit itself.
> Spark-submit starts a jvm which could fail due to insufficient memory on the
> machine where I run spark-submit if many spark-submit jvm are running. Any
> suggestions on how to solve this problem? Thank you!



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

--
Best Regards,
Ayan Guha
Reply | Threaded
Open this post in threaded view
|

Re: Submit many spark applications

raksja
In reply to this post by Marcelo Vanzin
Hi Marcelo,

I'm facing same issue when making spark-submits from an ec2 instance and
reaching native memory limit sooner. we have the #1, but we are still in
spark 2.1.0, couldnt try #2.

So InProcessLauncher wouldnt use the native memory, so will it overload the
mem of parent process?

Is there any way that we can overcome this?




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Submit many spark applications

Marcelo Vanzin
On Wed, May 23, 2018 at 12:04 PM, raksja <[hidden email]> wrote:
> So InProcessLauncher wouldnt use the native memory, so will it overload the
> mem of parent process?

I will still use "native memory" (since the parent process will still
use memory), just less of it. But yes, it will use more memory in the
parent process.

> Is there any way that we can overcome this?

Try to launch less applications concurrently.


--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Submit many spark applications

raksja
thanks for the reply.

Have you tried submit a spark job directly to Yarn using YarnClient.
https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/yarn/client/api/YarnClient.html

Not sure whether its performant and scalable?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Submit many spark applications

Marcelo Vanzin
That's what Spark uses.

On Fri, May 25, 2018 at 10:09 AM, raksja <[hidden email]> wrote:

> thanks for the reply.
>
> Have you tried submit a spark job directly to Yarn using YarnClient.
> https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/yarn/client/api/YarnClient.html
>
> Not sure whether its performant and scalable?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Submit many spark applications

raksja
When you mean spark uses, did you meant this
https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala?

InProcessLauncher would just start a subprocess as you mentioned earlier.
How about this, does this makes a rest api call to yarn?

Do you think given my case where i have several concurrent jobs would you
recommend spark yarn client (mentioned above) over InProcessLauncher?






--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Submit many spark applications

Marcelo Vanzin
On Fri, May 25, 2018 at 10:18 AM, raksja <[hidden email]> wrote:
> InProcessLauncher would just start a subprocess as you mentioned earlier.

No. As the name says, it runs things in the same process.

--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Submit many spark applications

raksja
ok, when to use what?
do you have any recommendation?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Submit many spark applications

Marcelo Vanzin
I already gave my recommendation in my very first reply to this thread...

On Fri, May 25, 2018 at 10:23 AM, raksja <[hidden email]> wrote:

> ok, when to use what?
> do you have any recommendation?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Submit many spark applications

yncxcw
In reply to this post by Shiyuan
hi,

please try to reduce the default heap size for the machine you use to submit
applications:

For example:
    export _JAVA_OPTIONS="-Xmx512M"

The submitter which is also a JVM does not need to reserve lots of memory.


Wei





--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]