Purpose of spark-submit?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Purpose of spark-submit?

srobertjames
What is the purpose of spark-submit? Does it do anything outside of
the standard val conf = new SparkConf ... val sc = new SparkContext
... ?
Reply | Threaded
Open this post in threaded view
|

Re: Purpose of spark-submit?

Patrick Wendell
It fulfills a few different functions. The main one is giving users a
way to inject Spark as a runtime dependency separately from their
program and make sure they get exactly the right version of Spark. So
a user can bundle an application and then use spark-submit to send it
to different types of clusters (or using different versions of Spark).

It also unifies the way you bundle and submit an app for Yarn, Mesos,
etc... this was something that became very fragmented over time before
this was added.

Another feature is allowing users to set configuration values
dynamically rather than compile them inside of their program. That's
the one you mention here. You can choose to use this feature or not.
If you know your configs are not going to change, then you don't need
to set them with spark-submit.


On Wed, Jul 9, 2014 at 10:22 AM, Robert James <[hidden email]> wrote:
> What is the purpose of spark-submit? Does it do anything outside of
> the standard val conf = new SparkConf ... val sc = new SparkContext
> ... ?
Reply | Threaded
Open this post in threaded view
|

Re: Purpose of spark-submit?

Koert Kuipers

not sure I understand why unifying how you submit app for different platforms and dynamic configuration cannot be part of SparkConf and SparkContext?

for classpath a simple script similar to "hadoop classpath" that shows what needs to be added should be sufficient.

on spark standalone I can launch a program just fine with just SparkConf and SparkContext. not on yarn, so the spark-launch script must be doing a few things extra there I am missing... which makes things more difficult because I am not sure its realistic to expect every application that needs to run something on spark to be launched using spark-submit.

On Jul 9, 2014 3:45 AM, "Patrick Wendell" <[hidden email]> wrote:
It fulfills a few different functions. The main one is giving users a
way to inject Spark as a runtime dependency separately from their
program and make sure they get exactly the right version of Spark. So
a user can bundle an application and then use spark-submit to send it
to different types of clusters (or using different versions of Spark).

It also unifies the way you bundle and submit an app for Yarn, Mesos,
etc... this was something that became very fragmented over time before
this was added.

Another feature is allowing users to set configuration values
dynamically rather than compile them inside of their program. That's
the one you mention here. You can choose to use this feature or not.
If you know your configs are not going to change, then you don't need
to set them with spark-submit.


On Wed, Jul 9, 2014 at 10:22 AM, Robert James <[hidden email]> wrote:
> What is the purpose of spark-submit? Does it do anything outside of
> the standard val conf = new SparkConf ... val sc = new SparkContext
> ... ?
Reply | Threaded
Open this post in threaded view
|

Re: Purpose of spark-submit?

Surendranauth Hiraman
Are there any gaps beyond convenience and code/config separation in using spark-submit versus SparkConf/SparkContext if you are willing to set your own config?

If there are any gaps, +1 on having parity within SparkConf/SparkContext where possible. In my use case, we launch our jobs programmatically. In theory, we could shell out to spark-submit but it's not the best option for us.

So far, we are only using Standalone Cluster mode, so I'm not knowledgeable on the complexities of other modes, though.

-Suren



On Wed, Jul 9, 2014 at 8:20 AM, Koert Kuipers <[hidden email]> wrote:

not sure I understand why unifying how you submit app for different platforms and dynamic configuration cannot be part of SparkConf and SparkContext?

for classpath a simple script similar to "hadoop classpath" that shows what needs to be added should be sufficient.

on spark standalone I can launch a program just fine with just SparkConf and SparkContext. not on yarn, so the spark-launch script must be doing a few things extra there I am missing... which makes things more difficult because I am not sure its realistic to expect every application that needs to run something on spark to be launched using spark-submit.

On Jul 9, 2014 3:45 AM, "Patrick Wendell" <[hidden email]> wrote:
It fulfills a few different functions. The main one is giving users a
way to inject Spark as a runtime dependency separately from their
program and make sure they get exactly the right version of Spark. So
a user can bundle an application and then use spark-submit to send it
to different types of clusters (or using different versions of Spark).

It also unifies the way you bundle and submit an app for Yarn, Mesos,
etc... this was something that became very fragmented over time before
this was added.

Another feature is allowing users to set configuration values
dynamically rather than compile them inside of their program. That's
the one you mention here. You can choose to use this feature or not.
If you know your configs are not going to change, then you don't need
to set them with spark-submit.


On Wed, Jul 9, 2014 at 10:22 AM, Robert James <[hidden email]> wrote:
> What is the purpose of spark-submit? Does it do anything outside of
> the standard val conf = new SparkConf ... val sc = new SparkContext
> ... ?



--
                                                            
SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: (917) 525-2466 ext. 105
F: 646.349.4063
E: [hidden email]elos.io
W: www.velos.io

Reply | Threaded
Open this post in threaded view
|

Re: Purpose of spark-submit?

srobertjames
+1 to be able to do anything via SparkConf/SparkContext.  Our app
worked fine in Spark 0.9, but, after several days of wrestling with
uber jars and spark-submit, and so far failing to get Spark 1.0
working, we'd like to go back to doing it ourself with SparkConf.

As the previous poster said, a few scripts should be able to give us
the classpath and any other params we need, and be a lot more
transparent and debuggable.

On 7/9/14, Surendranauth Hiraman <[hidden email]> wrote:

> Are there any gaps beyond convenience and code/config separation in using
> spark-submit versus SparkConf/SparkContext if you are willing to set your
> own config?
>
> If there are any gaps, +1 on having parity within SparkConf/SparkContext
> where possible. In my use case, we launch our jobs programmatically. In
> theory, we could shell out to spark-submit but it's not the best option for
> us.
>
> So far, we are only using Standalone Cluster mode, so I'm not knowledgeable
> on the complexities of other modes, though.
>
> -Suren
>
>
>
> On Wed, Jul 9, 2014 at 8:20 AM, Koert Kuipers <[hidden email]> wrote:
>
>> not sure I understand why unifying how you submit app for different
>> platforms and dynamic configuration cannot be part of SparkConf and
>> SparkContext?
>>
>> for classpath a simple script similar to "hadoop classpath" that shows
>> what needs to be added should be sufficient.
>>
>> on spark standalone I can launch a program just fine with just SparkConf
>> and SparkContext. not on yarn, so the spark-launch script must be doing a
>> few things extra there I am missing... which makes things more difficult
>> because I am not sure its realistic to expect every application that
>> needs
>> to run something on spark to be launched using spark-submit.
>>  On Jul 9, 2014 3:45 AM, "Patrick Wendell" <[hidden email]> wrote:
>>
>>> It fulfills a few different functions. The main one is giving users a
>>> way to inject Spark as a runtime dependency separately from their
>>> program and make sure they get exactly the right version of Spark. So
>>> a user can bundle an application and then use spark-submit to send it
>>> to different types of clusters (or using different versions of Spark).
>>>
>>> It also unifies the way you bundle and submit an app for Yarn, Mesos,
>>> etc... this was something that became very fragmented over time before
>>> this was added.
>>>
>>> Another feature is allowing users to set configuration values
>>> dynamically rather than compile them inside of their program. That's
>>> the one you mention here. You can choose to use this feature or not.
>>> If you know your configs are not going to change, then you don't need
>>> to set them with spark-submit.
>>>
>>>
>>> On Wed, Jul 9, 2014 at 10:22 AM, Robert James <[hidden email]>
>>> wrote:
>>> > What is the purpose of spark-submit? Does it do anything outside of
>>> > the standard val conf = new SparkConf ... val sc = new SparkContext
>>> > ... ?
>>>
>>
>
>
> --
>
> SUREN HIRAMAN, VP TECHNOLOGY
> Velos
> Accelerating Machine Learning
>
> 440 NINTH AVENUE, 11TH FLOOR
> NEW YORK, NY 10001
> O: (917) 525-2466 ext. 105
> F: 646.349.4063
> E: suren.hiraman@v <[hidden email]>elos.io
> W: www.velos.io
>
Reply | Threaded
Open this post in threaded view
|

Re: Purpose of spark-submit?

Jerry Lam
+1 as well for being able to submit jobs programmatically without using shell script.

we also experience issues of submitting jobs programmatically without using spark-submit. In fact, even in the Hadoop World, I rarely used "hadoop jar" to submit jobs in shell. 



On Wed, Jul 9, 2014 at 9:47 AM, Robert James <[hidden email]> wrote:
+1 to be able to do anything via SparkConf/SparkContext.  Our app
worked fine in Spark 0.9, but, after several days of wrestling with
uber jars and spark-submit, and so far failing to get Spark 1.0
working, we'd like to go back to doing it ourself with SparkConf.

As the previous poster said, a few scripts should be able to give us
the classpath and any other params we need, and be a lot more
transparent and debuggable.

On 7/9/14, Surendranauth Hiraman <[hidden email]> wrote:
> Are there any gaps beyond convenience and code/config separation in using
> spark-submit versus SparkConf/SparkContext if you are willing to set your
> own config?
>
> If there are any gaps, +1 on having parity within SparkConf/SparkContext
> where possible. In my use case, we launch our jobs programmatically. In
> theory, we could shell out to spark-submit but it's not the best option for
> us.
>
> So far, we are only using Standalone Cluster mode, so I'm not knowledgeable
> on the complexities of other modes, though.
>
> -Suren
>
>
>
> On Wed, Jul 9, 2014 at 8:20 AM, Koert Kuipers <[hidden email]> wrote:
>
>> not sure I understand why unifying how you submit app for different
>> platforms and dynamic configuration cannot be part of SparkConf and
>> SparkContext?
>>
>> for classpath a simple script similar to "hadoop classpath" that shows
>> what needs to be added should be sufficient.
>>
>> on spark standalone I can launch a program just fine with just SparkConf
>> and SparkContext. not on yarn, so the spark-launch script must be doing a
>> few things extra there I am missing... which makes things more difficult
>> because I am not sure its realistic to expect every application that
>> needs
>> to run something on spark to be launched using spark-submit.
>>  On Jul 9, 2014 3:45 AM, "Patrick Wendell" <[hidden email]> wrote:
>>
>>> It fulfills a few different functions. The main one is giving users a
>>> way to inject Spark as a runtime dependency separately from their
>>> program and make sure they get exactly the right version of Spark. So
>>> a user can bundle an application and then use spark-submit to send it
>>> to different types of clusters (or using different versions of Spark).
>>>
>>> It also unifies the way you bundle and submit an app for Yarn, Mesos,
>>> etc... this was something that became very fragmented over time before
>>> this was added.
>>>
>>> Another feature is allowing users to set configuration values
>>> dynamically rather than compile them inside of their program. That's
>>> the one you mention here. You can choose to use this feature or not.
>>> If you know your configs are not going to change, then you don't need
>>> to set them with spark-submit.
>>>
>>>
>>> On Wed, Jul 9, 2014 at 10:22 AM, Robert James <[hidden email]>
>>> wrote:
>>> > What is the purpose of spark-submit? Does it do anything outside of
>>> > the standard val conf = new SparkConf ... val sc = new SparkContext
>>> > ... ?
>>>
>>
>
>
> --
>
> SUREN HIRAMAN, VP TECHNOLOGY
> Velos
> Accelerating Machine Learning
>
> 440 NINTH AVENUE, 11TH FLOOR
> NEW YORK, NY 10001
> O: <a href="tel:%28917%29%20525-2466%20ext.%20105" value="+19175252466">(917) 525-2466 ext. 105
> F: <a href="tel:646.349.4063" value="+16463494063">646.349.4063
> E: suren.hiraman@v <[hidden email]>elos.io
> W: www.velos.io
>

Reply | Threaded
Open this post in threaded view
|

Re: Purpose of spark-submit?

Andrei
One another +1. For me it's a question of embedding. With SparkConf/SparkContext I can easily create larger projects with Spark as a separate service (just like MySQL and JDBC, for example). With spark-submit I'm bound to Spark as a main framework that defines how my application should look like. In my humble opinion, using Spark as embeddable library rather than main framework and runtime is much easier.




On Wed, Jul 9, 2014 at 5:14 PM, Jerry Lam <[hidden email]> wrote:
+1 as well for being able to submit jobs programmatically without using shell script.

we also experience issues of submitting jobs programmatically without using spark-submit. In fact, even in the Hadoop World, I rarely used "hadoop jar" to submit jobs in shell. 



On Wed, Jul 9, 2014 at 9:47 AM, Robert James <[hidden email]> wrote:
+1 to be able to do anything via SparkConf/SparkContext.  Our app
worked fine in Spark 0.9, but, after several days of wrestling with
uber jars and spark-submit, and so far failing to get Spark 1.0
working, we'd like to go back to doing it ourself with SparkConf.

As the previous poster said, a few scripts should be able to give us
the classpath and any other params we need, and be a lot more
transparent and debuggable.

On 7/9/14, Surendranauth Hiraman <[hidden email]> wrote:
> Are there any gaps beyond convenience and code/config separation in using
> spark-submit versus SparkConf/SparkContext if you are willing to set your
> own config?
>
> If there are any gaps, +1 on having parity within SparkConf/SparkContext
> where possible. In my use case, we launch our jobs programmatically. In
> theory, we could shell out to spark-submit but it's not the best option for
> us.
>
> So far, we are only using Standalone Cluster mode, so I'm not knowledgeable
> on the complexities of other modes, though.
>
> -Suren
>
>
>
> On Wed, Jul 9, 2014 at 8:20 AM, Koert Kuipers <[hidden email]> wrote:
>
>> not sure I understand why unifying how you submit app for different
>> platforms and dynamic configuration cannot be part of SparkConf and
>> SparkContext?
>>
>> for classpath a simple script similar to "hadoop classpath" that shows
>> what needs to be added should be sufficient.
>>
>> on spark standalone I can launch a program just fine with just SparkConf
>> and SparkContext. not on yarn, so the spark-launch script must be doing a
>> few things extra there I am missing... which makes things more difficult
>> because I am not sure its realistic to expect every application that
>> needs
>> to run something on spark to be launched using spark-submit.
>>  On Jul 9, 2014 3:45 AM, "Patrick Wendell" <[hidden email]> wrote:
>>
>>> It fulfills a few different functions. The main one is giving users a
>>> way to inject Spark as a runtime dependency separately from their
>>> program and make sure they get exactly the right version of Spark. So
>>> a user can bundle an application and then use spark-submit to send it
>>> to different types of clusters (or using different versions of Spark).
>>>
>>> It also unifies the way you bundle and submit an app for Yarn, Mesos,
>>> etc... this was something that became very fragmented over time before
>>> this was added.
>>>
>>> Another feature is allowing users to set configuration values
>>> dynamically rather than compile them inside of their program. That's
>>> the one you mention here. You can choose to use this feature or not.
>>> If you know your configs are not going to change, then you don't need
>>> to set them with spark-submit.
>>>
>>>
>>> On Wed, Jul 9, 2014 at 10:22 AM, Robert James <[hidden email]>
>>> wrote:
>>> > What is the purpose of spark-submit? Does it do anything outside of
>>> > the standard val conf = new SparkConf ... val sc = new SparkContext
>>> > ... ?
>>>
>>
>
>
> --
>
> SUREN HIRAMAN, VP TECHNOLOGY
> Velos
> Accelerating Machine Learning
>
> 440 NINTH AVENUE, 11TH FLOOR
> NEW YORK, NY 10001
> O: <a href="tel:%28917%29%20525-2466%20ext.%20105" value="+19175252466" target="_blank">(917) 525-2466 ext. 105
> F: <a href="tel:646.349.4063" value="+16463494063" target="_blank">646.349.4063
> E: suren.hiraman@v <[hidden email]>elos.io
> W: www.velos.io
>


Reply | Threaded
Open this post in threaded view
|

Re: Purpose of spark-submit?

Sandy Ryza
Spark still supports the ability to submit jobs programmatically without shell scripts.

Koert,
The main reason that the unification can't be a part of SparkContext is that YARN and standalone support deploy modes where the driver runs in a managed process on the cluster.  In this case, the SparkContext is created on a remote node well after the application is launched.


On Wed, Jul 9, 2014 at 8:34 AM, Andrei <[hidden email]> wrote:
One another +1. For me it's a question of embedding. With SparkConf/SparkContext I can easily create larger projects with Spark as a separate service (just like MySQL and JDBC, for example). With spark-submit I'm bound to Spark as a main framework that defines how my application should look like. In my humble opinion, using Spark as embeddable library rather than main framework and runtime is much easier.




On Wed, Jul 9, 2014 at 5:14 PM, Jerry Lam <[hidden email]> wrote:
+1 as well for being able to submit jobs programmatically without using shell script.

we also experience issues of submitting jobs programmatically without using spark-submit. In fact, even in the Hadoop World, I rarely used "hadoop jar" to submit jobs in shell. 



On Wed, Jul 9, 2014 at 9:47 AM, Robert James <[hidden email]> wrote:
+1 to be able to do anything via SparkConf/SparkContext.  Our app
worked fine in Spark 0.9, but, after several days of wrestling with
uber jars and spark-submit, and so far failing to get Spark 1.0
working, we'd like to go back to doing it ourself with SparkConf.

As the previous poster said, a few scripts should be able to give us
the classpath and any other params we need, and be a lot more
transparent and debuggable.

On 7/9/14, Surendranauth Hiraman <[hidden email]> wrote:
> Are there any gaps beyond convenience and code/config separation in using
> spark-submit versus SparkConf/SparkContext if you are willing to set your
> own config?
>
> If there are any gaps, +1 on having parity within SparkConf/SparkContext
> where possible. In my use case, we launch our jobs programmatically. In
> theory, we could shell out to spark-submit but it's not the best option for
> us.
>
> So far, we are only using Standalone Cluster mode, so I'm not knowledgeable
> on the complexities of other modes, though.
>
> -Suren
>
>
>
> On Wed, Jul 9, 2014 at 8:20 AM, Koert Kuipers <[hidden email]> wrote:
>
>> not sure I understand why unifying how you submit app for different
>> platforms and dynamic configuration cannot be part of SparkConf and
>> SparkContext?
>>
>> for classpath a simple script similar to "hadoop classpath" that shows
>> what needs to be added should be sufficient.
>>
>> on spark standalone I can launch a program just fine with just SparkConf
>> and SparkContext. not on yarn, so the spark-launch script must be doing a
>> few things extra there I am missing... which makes things more difficult
>> because I am not sure its realistic to expect every application that
>> needs
>> to run something on spark to be launched using spark-submit.
>>  On Jul 9, 2014 3:45 AM, "Patrick Wendell" <[hidden email]> wrote:
>>
>>> It fulfills a few different functions. The main one is giving users a
>>> way to inject Spark as a runtime dependency separately from their
>>> program and make sure they get exactly the right version of Spark. So
>>> a user can bundle an application and then use spark-submit to send it
>>> to different types of clusters (or using different versions of Spark).
>>>
>>> It also unifies the way you bundle and submit an app for Yarn, Mesos,
>>> etc... this was something that became very fragmented over time before
>>> this was added.
>>>
>>> Another feature is allowing users to set configuration values
>>> dynamically rather than compile them inside of their program. That's
>>> the one you mention here. You can choose to use this feature or not.
>>> If you know your configs are not going to change, then you don't need
>>> to set them with spark-submit.
>>>
>>>
>>> On Wed, Jul 9, 2014 at 10:22 AM, Robert James <[hidden email]>
>>> wrote:
>>> > What is the purpose of spark-submit? Does it do anything outside of
>>> > the standard val conf = new SparkConf ... val sc = new SparkContext
>>> > ... ?
>>>
>>
>
>
> --
>
> SUREN HIRAMAN, VP TECHNOLOGY
> Velos
> Accelerating Machine Learning
>
> 440 NINTH AVENUE, 11TH FLOOR
> NEW YORK, NY 10001
> O: <a href="tel:%28917%29%20525-2466%20ext.%20105" value="+19175252466" target="_blank">(917) 525-2466 ext. 105
> F: <a href="tel:646.349.4063" value="+16463494063" target="_blank">646.349.4063
> E: suren.hiraman@v <[hidden email]>elos.io
> W: www.velos.io
>



Reply | Threaded
Open this post in threaded view
|

Re: Purpose of spark-submit?

Koert Kuipers
sandy, that makes sense. however i had trouble doing programmatic execution on yarn in client mode as well. the application-master in yarn came up but then bombed because it was looking for jars that dont exist (it was looking in the original file paths on the driver side, which are not available on the yarn node). my guess is that spark-submit is changing some settings (perhaps preparing the distributed cache and modifying settings accordingly), which makes it harder to run things programmatically. i could be wrong however. i gave up debugging and resorted to using spark-submit for now.



On Wed, Jul 9, 2014 at 12:05 PM, Sandy Ryza <[hidden email]> wrote:
Spark still supports the ability to submit jobs programmatically without shell scripts.

Koert,
The main reason that the unification can't be a part of SparkContext is that YARN and standalone support deploy modes where the driver runs in a managed process on the cluster.  In this case, the SparkContext is created on a remote node well after the application is launched.


On Wed, Jul 9, 2014 at 8:34 AM, Andrei <[hidden email]> wrote:
One another +1. For me it's a question of embedding. With SparkConf/SparkContext I can easily create larger projects with Spark as a separate service (just like MySQL and JDBC, for example). With spark-submit I'm bound to Spark as a main framework that defines how my application should look like. In my humble opinion, using Spark as embeddable library rather than main framework and runtime is much easier.




On Wed, Jul 9, 2014 at 5:14 PM, Jerry Lam <[hidden email]> wrote:
+1 as well for being able to submit jobs programmatically without using shell script.

we also experience issues of submitting jobs programmatically without using spark-submit. In fact, even in the Hadoop World, I rarely used "hadoop jar" to submit jobs in shell. 



On Wed, Jul 9, 2014 at 9:47 AM, Robert James <[hidden email]> wrote:
+1 to be able to do anything via SparkConf/SparkContext.  Our app
worked fine in Spark 0.9, but, after several days of wrestling with
uber jars and spark-submit, and so far failing to get Spark 1.0
working, we'd like to go back to doing it ourself with SparkConf.

As the previous poster said, a few scripts should be able to give us
the classpath and any other params we need, and be a lot more
transparent and debuggable.

On 7/9/14, Surendranauth Hiraman <[hidden email]> wrote:
> Are there any gaps beyond convenience and code/config separation in using
> spark-submit versus SparkConf/SparkContext if you are willing to set your
> own config?
>
> If there are any gaps, +1 on having parity within SparkConf/SparkContext
> where possible. In my use case, we launch our jobs programmatically. In
> theory, we could shell out to spark-submit but it's not the best option for
> us.
>
> So far, we are only using Standalone Cluster mode, so I'm not knowledgeable
> on the complexities of other modes, though.
>
> -Suren
>
>
>
> On Wed, Jul 9, 2014 at 8:20 AM, Koert Kuipers <[hidden email]> wrote:
>
>> not sure I understand why unifying how you submit app for different
>> platforms and dynamic configuration cannot be part of SparkConf and
>> SparkContext?
>>
>> for classpath a simple script similar to "hadoop classpath" that shows
>> what needs to be added should be sufficient.
>>
>> on spark standalone I can launch a program just fine with just SparkConf
>> and SparkContext. not on yarn, so the spark-launch script must be doing a
>> few things extra there I am missing... which makes things more difficult
>> because I am not sure its realistic to expect every application that
>> needs
>> to run something on spark to be launched using spark-submit.
>>  On Jul 9, 2014 3:45 AM, "Patrick Wendell" <[hidden email]> wrote:
>>
>>> It fulfills a few different functions. The main one is giving users a
>>> way to inject Spark as a runtime dependency separately from their
>>> program and make sure they get exactly the right version of Spark. So
>>> a user can bundle an application and then use spark-submit to send it
>>> to different types of clusters (or using different versions of Spark).
>>>
>>> It also unifies the way you bundle and submit an app for Yarn, Mesos,
>>> etc... this was something that became very fragmented over time before
>>> this was added.
>>>
>>> Another feature is allowing users to set configuration values
>>> dynamically rather than compile them inside of their program. That's
>>> the one you mention here. You can choose to use this feature or not.
>>> If you know your configs are not going to change, then you don't need
>>> to set them with spark-submit.
>>>
>>>
>>> On Wed, Jul 9, 2014 at 10:22 AM, Robert James <[hidden email]>
>>> wrote:
>>> > What is the purpose of spark-submit? Does it do anything outside of
>>> > the standard val conf = new SparkConf ... val sc = new SparkContext
>>> > ... ?
>>>
>>
>
>
> --
>
> SUREN HIRAMAN, VP TECHNOLOGY
> Velos
> Accelerating Machine Learning
>
> 440 NINTH AVENUE, 11TH FLOOR
> NEW YORK, NY 10001
> O: <a href="tel:%28917%29%20525-2466%20ext.%20105" value="+19175252466" target="_blank">(917) 525-2466 ext. 105
> F: <a href="tel:646.349.4063" value="+16463494063" target="_blank">646.349.4063
> E: suren.hiraman@v <[hidden email]>elos.io
> W: www.velos.io
>




Reply | Threaded
Open this post in threaded view
|

Re: Purpose of spark-submit?

Jerry Lam
Sandy, I experienced the similar behavior as Koert just mentioned. I don't understand why there is a difference between using spark-submit and programmatic execution. Maybe there is something else we need to add to the spark conf/spark context in order to launch spark jobs programmatically that are not needed before?



On Wed, Jul 9, 2014 at 12:14 PM, Koert Kuipers <[hidden email]> wrote:
sandy, that makes sense. however i had trouble doing programmatic execution on yarn in client mode as well. the application-master in yarn came up but then bombed because it was looking for jars that dont exist (it was looking in the original file paths on the driver side, which are not available on the yarn node). my guess is that spark-submit is changing some settings (perhaps preparing the distributed cache and modifying settings accordingly), which makes it harder to run things programmatically. i could be wrong however. i gave up debugging and resorted to using spark-submit for now.



On Wed, Jul 9, 2014 at 12:05 PM, Sandy Ryza <[hidden email]> wrote:
Spark still supports the ability to submit jobs programmatically without shell scripts.

Koert,
The main reason that the unification can't be a part of SparkContext is that YARN and standalone support deploy modes where the driver runs in a managed process on the cluster.  In this case, the SparkContext is created on a remote node well after the application is launched.


On Wed, Jul 9, 2014 at 8:34 AM, Andrei <[hidden email]> wrote:
One another +1. For me it's a question of embedding. With SparkConf/SparkContext I can easily create larger projects with Spark as a separate service (just like MySQL and JDBC, for example). With spark-submit I'm bound to Spark as a main framework that defines how my application should look like. In my humble opinion, using Spark as embeddable library rather than main framework and runtime is much easier.




On Wed, Jul 9, 2014 at 5:14 PM, Jerry Lam <[hidden email]> wrote:
+1 as well for being able to submit jobs programmatically without using shell script.

we also experience issues of submitting jobs programmatically without using spark-submit. In fact, even in the Hadoop World, I rarely used "hadoop jar" to submit jobs in shell. 



On Wed, Jul 9, 2014 at 9:47 AM, Robert James <[hidden email]> wrote:
+1 to be able to do anything via SparkConf/SparkContext.  Our app
worked fine in Spark 0.9, but, after several days of wrestling with
uber jars and spark-submit, and so far failing to get Spark 1.0
working, we'd like to go back to doing it ourself with SparkConf.

As the previous poster said, a few scripts should be able to give us
the classpath and any other params we need, and be a lot more
transparent and debuggable.

On 7/9/14, Surendranauth Hiraman <[hidden email]> wrote:
> Are there any gaps beyond convenience and code/config separation in using
> spark-submit versus SparkConf/SparkContext if you are willing to set your
> own config?
>
> If there are any gaps, +1 on having parity within SparkConf/SparkContext
> where possible. In my use case, we launch our jobs programmatically. In
> theory, we could shell out to spark-submit but it's not the best option for
> us.
>
> So far, we are only using Standalone Cluster mode, so I'm not knowledgeable
> on the complexities of other modes, though.
>
> -Suren
>
>
>
> On Wed, Jul 9, 2014 at 8:20 AM, Koert Kuipers <[hidden email]> wrote:
>
>> not sure I understand why unifying how you submit app for different
>> platforms and dynamic configuration cannot be part of SparkConf and
>> SparkContext?
>>
>> for classpath a simple script similar to "hadoop classpath" that shows
>> what needs to be added should be sufficient.
>>
>> on spark standalone I can launch a program just fine with just SparkConf
>> and SparkContext. not on yarn, so the spark-launch script must be doing a
>> few things extra there I am missing... which makes things more difficult
>> because I am not sure its realistic to expect every application that
>> needs
>> to run something on spark to be launched using spark-submit.
>>  On Jul 9, 2014 3:45 AM, "Patrick Wendell" <[hidden email]> wrote:
>>
>>> It fulfills a few different functions. The main one is giving users a
>>> way to inject Spark as a runtime dependency separately from their
>>> program and make sure they get exactly the right version of Spark. So
>>> a user can bundle an application and then use spark-submit to send it
>>> to different types of clusters (or using different versions of Spark).
>>>
>>> It also unifies the way you bundle and submit an app for Yarn, Mesos,
>>> etc... this was something that became very fragmented over time before
>>> this was added.
>>>
>>> Another feature is allowing users to set configuration values
>>> dynamically rather than compile them inside of their program. That's
>>> the one you mention here. You can choose to use this feature or not.
>>> If you know your configs are not going to change, then you don't need
>>> to set them with spark-submit.
>>>
>>>
>>> On Wed, Jul 9, 2014 at 10:22 AM, Robert James <[hidden email]>
>>> wrote:
>>> > What is the purpose of spark-submit? Does it do anything outside of
>>> > the standard val conf = new SparkConf ... val sc = new SparkContext
>>> > ... ?
>>>
>>
>
>
> --
>
> SUREN HIRAMAN, VP TECHNOLOGY
> Velos
> Accelerating Machine Learning
>
> 440 NINTH AVENUE, 11TH FLOOR
> NEW YORK, NY 10001
> O: <a href="tel:%28917%29%20525-2466%20ext.%20105" value="+19175252466" target="_blank">(917) 525-2466 ext. 105
> F: <a href="tel:646.349.4063" value="+16463494063" target="_blank">646.349.4063
> E: suren.hiraman@v <[hidden email]>elos.io
> W: www.velos.io
>





Reply | Threaded
Open this post in threaded view
|

Re: Purpose of spark-submit?

Sandy Ryza
Are you able to share the error you're getting?


On Wed, Jul 9, 2014 at 9:25 AM, Jerry Lam <[hidden email]> wrote:
Sandy, I experienced the similar behavior as Koert just mentioned. I don't understand why there is a difference between using spark-submit and programmatic execution. Maybe there is something else we need to add to the spark conf/spark context in order to launch spark jobs programmatically that are not needed before?



On Wed, Jul 9, 2014 at 12:14 PM, Koert Kuipers <[hidden email]> wrote:
sandy, that makes sense. however i had trouble doing programmatic execution on yarn in client mode as well. the application-master in yarn came up but then bombed because it was looking for jars that dont exist (it was looking in the original file paths on the driver side, which are not available on the yarn node). my guess is that spark-submit is changing some settings (perhaps preparing the distributed cache and modifying settings accordingly), which makes it harder to run things programmatically. i could be wrong however. i gave up debugging and resorted to using spark-submit for now.



On Wed, Jul 9, 2014 at 12:05 PM, Sandy Ryza <[hidden email]> wrote:
Spark still supports the ability to submit jobs programmatically without shell scripts.

Koert,
The main reason that the unification can't be a part of SparkContext is that YARN and standalone support deploy modes where the driver runs in a managed process on the cluster.  In this case, the SparkContext is created on a remote node well after the application is launched.


On Wed, Jul 9, 2014 at 8:34 AM, Andrei <[hidden email]> wrote:
One another +1. For me it's a question of embedding. With SparkConf/SparkContext I can easily create larger projects with Spark as a separate service (just like MySQL and JDBC, for example). With spark-submit I'm bound to Spark as a main framework that defines how my application should look like. In my humble opinion, using Spark as embeddable library rather than main framework and runtime is much easier.




On Wed, Jul 9, 2014 at 5:14 PM, Jerry Lam <[hidden email]> wrote:
+1 as well for being able to submit jobs programmatically without using shell script.

we also experience issues of submitting jobs programmatically without using spark-submit. In fact, even in the Hadoop World, I rarely used "hadoop jar" to submit jobs in shell. 



On Wed, Jul 9, 2014 at 9:47 AM, Robert James <[hidden email]> wrote:
+1 to be able to do anything via SparkConf/SparkContext.  Our app
worked fine in Spark 0.9, but, after several days of wrestling with
uber jars and spark-submit, and so far failing to get Spark 1.0
working, we'd like to go back to doing it ourself with SparkConf.

As the previous poster said, a few scripts should be able to give us
the classpath and any other params we need, and be a lot more
transparent and debuggable.

On 7/9/14, Surendranauth Hiraman <[hidden email]> wrote:
> Are there any gaps beyond convenience and code/config separation in using
> spark-submit versus SparkConf/SparkContext if you are willing to set your
> own config?
>
> If there are any gaps, +1 on having parity within SparkConf/SparkContext
> where possible. In my use case, we launch our jobs programmatically. In
> theory, we could shell out to spark-submit but it's not the best option for
> us.
>
> So far, we are only using Standalone Cluster mode, so I'm not knowledgeable
> on the complexities of other modes, though.
>
> -Suren
>
>
>
> On Wed, Jul 9, 2014 at 8:20 AM, Koert Kuipers <[hidden email]> wrote:
>
>> not sure I understand why unifying how you submit app for different
>> platforms and dynamic configuration cannot be part of SparkConf and
>> SparkContext?
>>
>> for classpath a simple script similar to "hadoop classpath" that shows
>> what needs to be added should be sufficient.
>>
>> on spark standalone I can launch a program just fine with just SparkConf
>> and SparkContext. not on yarn, so the spark-launch script must be doing a
>> few things extra there I am missing... which makes things more difficult
>> because I am not sure its realistic to expect every application that
>> needs
>> to run something on spark to be launched using spark-submit.
>>  On Jul 9, 2014 3:45 AM, "Patrick Wendell" <[hidden email]> wrote:
>>
>>> It fulfills a few different functions. The main one is giving users a
>>> way to inject Spark as a runtime dependency separately from their
>>> program and make sure they get exactly the right version of Spark. So
>>> a user can bundle an application and then use spark-submit to send it
>>> to different types of clusters (or using different versions of Spark).
>>>
>>> It also unifies the way you bundle and submit an app for Yarn, Mesos,
>>> etc... this was something that became very fragmented over time before
>>> this was added.
>>>
>>> Another feature is allowing users to set configuration values
>>> dynamically rather than compile them inside of their program. That's
>>> the one you mention here. You can choose to use this feature or not.
>>> If you know your configs are not going to change, then you don't need
>>> to set them with spark-submit.
>>>
>>>
>>> On Wed, Jul 9, 2014 at 10:22 AM, Robert James <[hidden email]>
>>> wrote:
>>> > What is the purpose of spark-submit? Does it do anything outside of
>>> > the standard val conf = new SparkConf ... val sc = new SparkContext
>>> > ... ?
>>>
>>
>
>
> --
>
> SUREN HIRAMAN, VP TECHNOLOGY
> Velos
> Accelerating Machine Learning
>
> 440 NINTH AVENUE, 11TH FLOOR
> NEW YORK, NY 10001
> O: <a href="tel:%28917%29%20525-2466%20ext.%20105" value="+19175252466" target="_blank">(917) 525-2466 ext. 105
> F: <a href="tel:646.349.4063" value="+16463494063" target="_blank">646.349.4063
> E: suren.hiraman@v <[hidden email]>elos.io
> W: www.velos.io
>






Reply | Threaded
Open this post in threaded view
|

Re: Purpose of spark-submit?

Ron Gonzalez
In reply to this post by Jerry Lam
Koert,
Yeah I had the same problems trying to do programmatic submission of spark jobs to my Yarn cluster. I was ultimately able to resolve it by reviewing the classpath and debugging through all the different things that the Spark Yarn client (Client.scala) did for submitting to Yarn (like env setup, local resources, etc), and I compared it to what spark-submit had done.
I have to admit though that it was far from trivial to get it working out of the box, and perhaps some work could be done in that regards. In my case, it had boiled down to the launch environment not having the HADOOP_CONF_DIR set, which prevented the app master from registering itself with the Resource Manager.

Thanks,
Ron

Sent from my iPad

On Jul 9, 2014, at 9:25 AM, Jerry Lam <[hidden email]> wrote:

Sandy, I experienced the similar behavior as Koert just mentioned. I don't understand why there is a difference between using spark-submit and programmatic execution. Maybe there is something else we need to add to the spark conf/spark context in order to launch spark jobs programmatically that are not needed before?



On Wed, Jul 9, 2014 at 12:14 PM, Koert Kuipers <[hidden email]> wrote:
sandy, that makes sense. however i had trouble doing programmatic execution on yarn in client mode as well. the application-master in yarn came up but then bombed because it was looking for jars that dont exist (it was looking in the original file paths on the driver side, which are not available on the yarn node). my guess is that spark-submit is changing some settings (perhaps preparing the distributed cache and modifying settings accordingly), which makes it harder to run things programmatically. i could be wrong however. i gave up debugging and resorted to using spark-submit for now.



On Wed, Jul 9, 2014 at 12:05 PM, Sandy Ryza <[hidden email]> wrote:
Spark still supports the ability to submit jobs programmatically without shell scripts.

Koert,
The main reason that the unification can't be a part of SparkContext is that YARN and standalone support deploy modes where the driver runs in a managed process on the cluster.  In this case, the SparkContext is created on a remote node well after the application is launched.


On Wed, Jul 9, 2014 at 8:34 AM, Andrei <[hidden email]> wrote:
One another +1. For me it's a question of embedding. With SparkConf/SparkContext I can easily create larger projects with Spark as a separate service (just like MySQL and JDBC, for example). With spark-submit I'm bound to Spark as a main framework that defines how my application should look like. In my humble opinion, using Spark as embeddable library rather than main framework and runtime is much easier.




On Wed, Jul 9, 2014 at 5:14 PM, Jerry Lam <[hidden email]> wrote:
+1 as well for being able to submit jobs programmatically without using shell script.

we also experience issues of submitting jobs programmatically without using spark-submit. In fact, even in the Hadoop World, I rarely used "hadoop jar" to submit jobs in shell. 



On Wed, Jul 9, 2014 at 9:47 AM, Robert James <[hidden email]> wrote:
+1 to be able to do anything via SparkConf/SparkContext.  Our app
worked fine in Spark 0.9, but, after several days of wrestling with
uber jars and spark-submit, and so far failing to get Spark 1.0
working, we'd like to go back to doing it ourself with SparkConf.

As the previous poster said, a few scripts should be able to give us
the classpath and any other params we need, and be a lot more
transparent and debuggable.

On 7/9/14, Surendranauth Hiraman <[hidden email]> wrote:
> Are there any gaps beyond convenience and code/config separation in using
> spark-submit versus SparkConf/SparkContext if you are willing to set your
> own config?
>
> If there are any gaps, +1 on having parity within SparkConf/SparkContext
> where possible. In my use case, we launch our jobs programmatically. In
> theory, we could shell out to spark-submit but it's not the best option for
> us.
>
> So far, we are only using Standalone Cluster mode, so I'm not knowledgeable
> on the complexities of other modes, though.
>
> -Suren
>
>
>
> On Wed, Jul 9, 2014 at 8:20 AM, Koert Kuipers <[hidden email]> wrote:
>
>> not sure I understand why unifying how you submit app for different
>> platforms and dynamic configuration cannot be part of SparkConf and
>> SparkContext?
>>
>> for classpath a simple script similar to "hadoop classpath" that shows
>> what needs to be added should be sufficient.
>>
>> on spark standalone I can launch a program just fine with just SparkConf
>> and SparkContext. not on yarn, so the spark-launch script must be doing a
>> few things extra there I am missing... which makes things more difficult
>> because I am not sure its realistic to expect every application that
>> needs
>> to run something on spark to be launched using spark-submit.
>>  On Jul 9, 2014 3:45 AM, "Patrick Wendell" <[hidden email]> wrote:
>>
>>> It fulfills a few different functions. The main one is giving users a
>>> way to inject Spark as a runtime dependency separately from their
>>> program and make sure they get exactly the right version of Spark. So
>>> a user can bundle an application and then use spark-submit to send it
>>> to different types of clusters (or using different versions of Spark).
>>>
>>> It also unifies the way you bundle and submit an app for Yarn, Mesos,
>>> etc... this was something that became very fragmented over time before
>>> this was added.
>>>
>>> Another feature is allowing users to set configuration values
>>> dynamically rather than compile them inside of their program. That's
>>> the one you mention here. You can choose to use this feature or not.
>>> If you know your configs are not going to change, then you don't need
>>> to set them with spark-submit.
>>>
>>>
>>> On Wed, Jul 9, 2014 at 10:22 AM, Robert James <[hidden email]>
>>> wrote:
>>> > What is the purpose of spark-submit? Does it do anything outside of
>>> > the standard val conf = new SparkConf ... val sc = new SparkContext
>>> > ... ?
>>>
>>
>
>
> --
>
> SUREN HIRAMAN, VP TECHNOLOGY
> Velos
> Accelerating Machine Learning
>
> 440 NINTH AVENUE, 11TH FLOOR
> NEW YORK, NY 10001
> O: <a href="tel:%28917%29%20525-2466%20ext.%20105" value="+19175252466" target="_blank">(917) 525-2466 ext. 105
> F: <a href="tel:646.349.4063" value="+16463494063" target="_blank">646.349.4063
> E: suren.hiraman@v <[hidden email]>elos.io
> W: www.velos.io
>





Reply | Threaded
Open this post in threaded view
|

Re: Purpose of spark-submit?

Ron Gonzalez
In reply to this post by Jerry Lam
I am able to use Client.scala or LauncherExecutor.scala as my programmatic entry point for Yarn.

Thanks,
Ron

Sent from my iPad

On Jul 9, 2014, at 7:14 AM, Jerry Lam <[hidden email]> wrote:

+1 as well for being able to submit jobs programmatically without using shell script.

we also experience issues of submitting jobs programmatically without using spark-submit. In fact, even in the Hadoop World, I rarely used "hadoop jar" to submit jobs in shell. 



On Wed, Jul 9, 2014 at 9:47 AM, Robert James <[hidden email]> wrote:
+1 to be able to do anything via SparkConf/SparkContext.  Our app
worked fine in Spark 0.9, but, after several days of wrestling with
uber jars and spark-submit, and so far failing to get Spark 1.0
working, we'd like to go back to doing it ourself with SparkConf.

As the previous poster said, a few scripts should be able to give us
the classpath and any other params we need, and be a lot more
transparent and debuggable.

On 7/9/14, Surendranauth Hiraman <[hidden email]> wrote:
> Are there any gaps beyond convenience and code/config separation in using
> spark-submit versus SparkConf/SparkContext if you are willing to set your
> own config?
>
> If there are any gaps, +1 on having parity within SparkConf/SparkContext
> where possible. In my use case, we launch our jobs programmatically. In
> theory, we could shell out to spark-submit but it's not the best option for
> us.
>
> So far, we are only using Standalone Cluster mode, so I'm not knowledgeable
> on the complexities of other modes, though.
>
> -Suren
>
>
>
> On Wed, Jul 9, 2014 at 8:20 AM, Koert Kuipers <[hidden email]> wrote:
>
>> not sure I understand why unifying how you submit app for different
>> platforms and dynamic configuration cannot be part of SparkConf and
>> SparkContext?
>>
>> for classpath a simple script similar to "hadoop classpath" that shows
>> what needs to be added should be sufficient.
>>
>> on spark standalone I can launch a program just fine with just SparkConf
>> and SparkContext. not on yarn, so the spark-launch script must be doing a
>> few things extra there I am missing... which makes things more difficult
>> because I am not sure its realistic to expect every application that
>> needs
>> to run something on spark to be launched using spark-submit.
>>  On Jul 9, 2014 3:45 AM, "Patrick Wendell" <[hidden email]> wrote:
>>
>>> It fulfills a few different functions. The main one is giving users a
>>> way to inject Spark as a runtime dependency separately from their
>>> program and make sure they get exactly the right version of Spark. So
>>> a user can bundle an application and then use spark-submit to send it
>>> to different types of clusters (or using different versions of Spark).
>>>
>>> It also unifies the way you bundle and submit an app for Yarn, Mesos,
>>> etc... this was something that became very fragmented over time before
>>> this was added.
>>>
>>> Another feature is allowing users to set configuration values
>>> dynamically rather than compile them inside of their program. That's
>>> the one you mention here. You can choose to use this feature or not.
>>> If you know your configs are not going to change, then you don't need
>>> to set them with spark-submit.
>>>
>>>
>>> On Wed, Jul 9, 2014 at 10:22 AM, Robert James <[hidden email]>
>>> wrote:
>>> > What is the purpose of spark-submit? Does it do anything outside of
>>> > the standard val conf = new SparkConf ... val sc = new SparkContext
>>> > ... ?
>>>
>>
>
>
> --
>
> SUREN HIRAMAN, VP TECHNOLOGY
> Velos
> Accelerating Machine Learning
>
> 440 NINTH AVENUE, 11TH FLOOR
> NEW YORK, NY 10001
> O: <a href="tel:%28917%29%20525-2466%20ext.%20105" value="+19175252466">(917) 525-2466 ext. 105
> F: <a href="tel:646.349.4063" value="+16463494063">646.349.4063
> E: suren.hiraman@v <[hidden email]>elos.io
> W: www.velos.io
>

Reply | Threaded
Open this post in threaded view
|

Re: Purpose of spark-submit?

Andrew Or-2
I don't see why using SparkSubmit.scala as your entry point would be any different, because all that does is invoke the main class of Client.scala (e.g. for Yarn) after setting up all the class paths and configuration options. (Though I haven't tried this myself)


2014-07-09 9:40 GMT-07:00 Ron Gonzalez <[hidden email]>:
I am able to use Client.scala or LauncherExecutor.scala as my programmatic entry point for Yarn.

Thanks,
Ron

Sent from my iPad

On Jul 9, 2014, at 7:14 AM, Jerry Lam <[hidden email]> wrote:

+1 as well for being able to submit jobs programmatically without using shell script.

we also experience issues of submitting jobs programmatically without using spark-submit. In fact, even in the Hadoop World, I rarely used "hadoop jar" to submit jobs in shell. 



On Wed, Jul 9, 2014 at 9:47 AM, Robert James <[hidden email]> wrote:
+1 to be able to do anything via SparkConf/SparkContext.  Our app
worked fine in Spark 0.9, but, after several days of wrestling with
uber jars and spark-submit, and so far failing to get Spark 1.0
working, we'd like to go back to doing it ourself with SparkConf.

As the previous poster said, a few scripts should be able to give us
the classpath and any other params we need, and be a lot more
transparent and debuggable.

On 7/9/14, Surendranauth Hiraman <[hidden email]> wrote:
> Are there any gaps beyond convenience and code/config separation in using
> spark-submit versus SparkConf/SparkContext if you are willing to set your
> own config?
>
> If there are any gaps, +1 on having parity within SparkConf/SparkContext
> where possible. In my use case, we launch our jobs programmatically. In
> theory, we could shell out to spark-submit but it's not the best option for
> us.
>
> So far, we are only using Standalone Cluster mode, so I'm not knowledgeable
> on the complexities of other modes, though.
>
> -Suren
>
>
>
> On Wed, Jul 9, 2014 at 8:20 AM, Koert Kuipers <[hidden email]> wrote:
>
>> not sure I understand why unifying how you submit app for different
>> platforms and dynamic configuration cannot be part of SparkConf and
>> SparkContext?
>>
>> for classpath a simple script similar to "hadoop classpath" that shows
>> what needs to be added should be sufficient.
>>
>> on spark standalone I can launch a program just fine with just SparkConf
>> and SparkContext. not on yarn, so the spark-launch script must be doing a
>> few things extra there I am missing... which makes things more difficult
>> because I am not sure its realistic to expect every application that
>> needs
>> to run something on spark to be launched using spark-submit.
>>  On Jul 9, 2014 3:45 AM, "Patrick Wendell" <[hidden email]> wrote:
>>
>>> It fulfills a few different functions. The main one is giving users a
>>> way to inject Spark as a runtime dependency separately from their
>>> program and make sure they get exactly the right version of Spark. So
>>> a user can bundle an application and then use spark-submit to send it
>>> to different types of clusters (or using different versions of Spark).
>>>
>>> It also unifies the way you bundle and submit an app for Yarn, Mesos,
>>> etc... this was something that became very fragmented over time before
>>> this was added.
>>>
>>> Another feature is allowing users to set configuration values
>>> dynamically rather than compile them inside of their program. That's
>>> the one you mention here. You can choose to use this feature or not.
>>> If you know your configs are not going to change, then you don't need
>>> to set them with spark-submit.
>>>
>>>
>>> On Wed, Jul 9, 2014 at 10:22 AM, Robert James <[hidden email]>
>>> wrote:
>>> > What is the purpose of spark-submit? Does it do anything outside of
>>> > the standard val conf = new SparkConf ... val sc = new SparkContext
>>> > ... ?
>>>
>>
>
>
> --
>
> SUREN HIRAMAN, VP TECHNOLOGY
> Velos
> Accelerating Machine Learning
>
> 440 NINTH AVENUE, 11TH FLOOR
> NEW YORK, NY 10001
> O: <a href="tel:%28917%29%20525-2466%20ext.%20105" value="+19175252466" target="_blank">(917) 525-2466 ext. 105
> F: <a href="tel:646.349.4063" value="+16463494063" target="_blank">646.349.4063
> E: suren.hiraman@v <[hidden email]>elos.io
> W: www.velos.io
>


Reply | Threaded
Open this post in threaded view
|

Re: Purpose of spark-submit?

Koert Kuipers
when i write a general spark application i use SparkConf/SparkContext, not Client.scala for Yarn


On Wed, Jul 9, 2014 at 9:39 PM, Andrew Or <[hidden email]> wrote:
I don't see why using SparkSubmit.scala as your entry point would be any different, because all that does is invoke the main class of Client.scala (e.g. for Yarn) after setting up all the class paths and configuration options. (Though I haven't tried this myself)


2014-07-09 9:40 GMT-07:00 Ron Gonzalez <[hidden email]>:

I am able to use Client.scala or LauncherExecutor.scala as my programmatic entry point for Yarn.

Thanks,
Ron

Sent from my iPad

On Jul 9, 2014, at 7:14 AM, Jerry Lam <[hidden email]> wrote:

+1 as well for being able to submit jobs programmatically without using shell script.

we also experience issues of submitting jobs programmatically without using spark-submit. In fact, even in the Hadoop World, I rarely used "hadoop jar" to submit jobs in shell. 



On Wed, Jul 9, 2014 at 9:47 AM, Robert James <[hidden email]> wrote:
+1 to be able to do anything via SparkConf/SparkContext.  Our app
worked fine in Spark 0.9, but, after several days of wrestling with
uber jars and spark-submit, and so far failing to get Spark 1.0
working, we'd like to go back to doing it ourself with SparkConf.

As the previous poster said, a few scripts should be able to give us
the classpath and any other params we need, and be a lot more
transparent and debuggable.

On 7/9/14, Surendranauth Hiraman <[hidden email]> wrote:
> Are there any gaps beyond convenience and code/config separation in using
> spark-submit versus SparkConf/SparkContext if you are willing to set your
> own config?
>
> If there are any gaps, +1 on having parity within SparkConf/SparkContext
> where possible. In my use case, we launch our jobs programmatically. In
> theory, we could shell out to spark-submit but it's not the best option for
> us.
>
> So far, we are only using Standalone Cluster mode, so I'm not knowledgeable
> on the complexities of other modes, though.
>
> -Suren
>
>
>
> On Wed, Jul 9, 2014 at 8:20 AM, Koert Kuipers <[hidden email]> wrote:
>
>> not sure I understand why unifying how you submit app for different
>> platforms and dynamic configuration cannot be part of SparkConf and
>> SparkContext?
>>
>> for classpath a simple script similar to "hadoop classpath" that shows
>> what needs to be added should be sufficient.
>>
>> on spark standalone I can launch a program just fine with just SparkConf
>> and SparkContext. not on yarn, so the spark-launch script must be doing a
>> few things extra there I am missing... which makes things more difficult
>> because I am not sure its realistic to expect every application that
>> needs
>> to run something on spark to be launched using spark-submit.
>>  On Jul 9, 2014 3:45 AM, "Patrick Wendell" <[hidden email]> wrote:
>>
>>> It fulfills a few different functions. The main one is giving users a
>>> way to inject Spark as a runtime dependency separately from their
>>> program and make sure they get exactly the right version of Spark. So
>>> a user can bundle an application and then use spark-submit to send it
>>> to different types of clusters (or using different versions of Spark).
>>>
>>> It also unifies the way you bundle and submit an app for Yarn, Mesos,
>>> etc... this was something that became very fragmented over time before
>>> this was added.
>>>
>>> Another feature is allowing users to set configuration values
>>> dynamically rather than compile them inside of their program. That's
>>> the one you mention here. You can choose to use this feature or not.
>>> If you know your configs are not going to change, then you don't need
>>> to set them with spark-submit.
>>>
>>>
>>> On Wed, Jul 9, 2014 at 10:22 AM, Robert James <[hidden email]>
>>> wrote:
>>> > What is the purpose of spark-submit? Does it do anything outside of
>>> > the standard val conf = new SparkConf ... val sc = new SparkContext
>>> > ... ?
>>>
>>
>
>
> --
>
> SUREN HIRAMAN, VP TECHNOLOGY
> Velos
> Accelerating Machine Learning
>
> 440 NINTH AVENUE, 11TH FLOOR
> NEW YORK, NY 10001
> O: <a href="tel:%28917%29%20525-2466%20ext.%20105" value="+19175252466" target="_blank">(917) 525-2466 ext. 105
> F: <a href="tel:646.349.4063" value="+16463494063" target="_blank">646.349.4063
> E: suren.hiraman@v <[hidden email]>elos.io
> W: www.velos.io
>