Programatically running of the Spark Jobs.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Programatically running of the Spark Jobs.

Vicky Kak
I have been able to submit the spark jobs using the submit script but I would like to do it via code.
I am unable to search anything matching to my need.
I am thinking of using org.apache.spark.deploy.SparkSubmit to do so, may be have to write some utility that passes the parameters required for this class.
I would be interested to know how community is doing.

Thanks,
Vicky
Reply | Threaded
Open this post in threaded view
|

Re: Programatically running of the Spark Jobs.

matt.chu

Ooyala's Spark jobserver is the current de facto standard, IIUC. I just added it to our prototype stack, and will begin trying it out soon. Note that you can only do standalone or Mesos; YARN isn't quite there yet.

(The repo just moved from https://github.com/ooyala/spark-jobserver, so don't trust Google on this one (yet); development is happening in the first repo.)



On Wed, Sep 3, 2014 at 11:39 PM, Vicky Kak <[hidden email]> wrote:
I have been able to submit the spark jobs using the submit script but I would like to do it via code.
I am unable to search anything matching to my need.
I am thinking of using org.apache.spark.deploy.SparkSubmit to do so, may be have to write some utility that passes the parameters required for this class.
I would be interested to know how community is doing.

Thanks,
Vicky

Reply | Threaded
Open this post in threaded view
|

RE: Programatically running of the Spark Jobs.

AltisourceCuroli

 

     Hello,

 

  Can this be used as a library from within another application?

  Thanks!

 

     Best, Oliver

 

From: Matt Chu [mailto:[hidden email]]
Sent: Thursday, September 04, 2014 2:46 AM
To: Vicky Kak
Cc: user
Subject: Re: Programatically running of the Spark Jobs.

 

 

Ooyala's Spark jobserver is the current de facto standard, IIUC. I just added it to our prototype stack, and will begin trying it out soon. Note that you can only do standalone or Mesos; YARN isn't quite there yet.

 

(The repo just moved from https://github.com/ooyala/spark-jobserver, so don't trust Google on this one (yet); development is happening in the first repo.)

 

 

On Wed, Sep 3, 2014 at 11:39 PM, Vicky Kak <[hidden email]> wrote:

I have been able to submit the spark jobs using the submit script but I would like to do it via code.

I am unable to search anything matching to my need.

I am thinking of using org.apache.spark.deploy.SparkSubmit to do so, may be have to write some utility that passes the parameters required for this class.

I would be interested to know how community is doing.

Thanks,
Vicky

 

***********************************************************************************************************************

This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses.
***********************************************************************************************************************

Reply | Threaded
Open this post in threaded view
|

Re: Programatically running of the Spark Jobs.

Vicky Kak
I don't think so.


On Thu, Sep 4, 2014 at 5:36 PM, Ruebenacker, Oliver A <[hidden email]> wrote:

 

     Hello,

 

  Can this be used as a library from within another application?

  Thanks!

 

     Best, Oliver

 

From: Matt Chu [mailto:[hidden email]]
Sent: Thursday, September 04, 2014 2:46 AM
To: Vicky Kak
Cc: user
Subject: Re: Programatically running of the Spark Jobs.

 

 

Ooyala's Spark jobserver is the current de facto standard, IIUC. I just added it to our prototype stack, and will begin trying it out soon. Note that you can only do standalone or Mesos; YARN isn't quite there yet.

 

(The repo just moved from https://github.com/ooyala/spark-jobserver, so don't trust Google on this one (yet); development is happening in the first repo.)

 

 

On Wed, Sep 3, 2014 at 11:39 PM, Vicky Kak <[hidden email]> wrote:

I have been able to submit the spark jobs using the submit script but I would like to do it via code.

I am unable to search anything matching to my need.

I am thinking of using org.apache.spark.deploy.SparkSubmit to do so, may be have to write some utility that passes the parameters required for this class.

I would be interested to know how community is doing.

Thanks,
Vicky

 

***********************************************************************************************************************

This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system.

This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses.
***********************************************************************************************************************

Reply | Threaded
Open this post in threaded view
|

Re: Programatically running of the Spark Jobs.

ericacm
In reply to this post by matt.chu
Ahh - that probably explains an issue I am seeing.  I am a brand new user and I tried running the SimpleApp class that is on the Quick Start page (http://spark.apache.org/docs/latest/quick-start.html).

When I use conf.setMaster("local") then I can run the class directly from my IDE.  But when I try to set the master to my standalone cluster using conf.setMaster("spark://myhost:7077") and then run the class directly from the IDE I got this error in the local application (running from the IDE):

14/09/01 10:56:04 ERROR scheduler.TaskSetManager: Task 0.0:0 failed 4 times; aborting job
14/09/01 10:56:04 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
14/09/01 10:56:04 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
14/09/01 10:56:04 INFO client.AppClient$ClientActor: Executor updated: app-20140901105546-0001/3 is now EXITED (Command exited with code 52)
14/09/01 10:56:04 INFO cluster.SparkDeploySchedulerBackend: Executor app-20140901105546-0001/3 removed: Command exited with code 52
14/09/01 10:56:04 INFO scheduler.DAGScheduler: Failed to run count at SimpleApp.scala:17
Exception in thread "main" 14/09/01 10:56:04 INFO client.AppClient$ClientActor: Executor added: app-20140901105546-0001/4 on worker-20140901105055-10.0.1.5-56156 (10.0.1.5:56156) with 8 cores
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 failed 4 times, most recent failure: TID 3 on host 10.0.1.5 failed for unknown reason

and this error in the worker stderr:

14/09/01 10:55:54 ERROR Executor: Exception in task ID 1
java.lang.OutOfMemoryError: Java heap space
        at org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java:183)
        at org.apache.hadoop.conf.Configuration.readFields(Configuration.java:2378)
        at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
        at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77)
        at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:42)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1004)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1872)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)

Which made no sense because I also gave the worker 1gb of heap and it was trying to process a 4k README.md file.  I'm guessing it must have tried to deserialize a bogus object because I was not submitting the job correctly (via spark-submit or this spark-jobserver)?

Thanks,
Reply | Threaded
Open this post in threaded view
|

Re: Programatically running of the Spark Jobs.

Guru Medasani
In reply to this post by matt.chu
I am able to run Spark jobs and Spark Streaming jobs successfully via YARN on a CDH cluster. 

When you mean YARN isn’t quite there yet, you mean to submit the jobs programmatically? or just in general?
 

On Sep 4, 2014, at 1:45 AM, Matt Chu <[hidden email]> wrote:


Ooyala's Spark jobserver is the current de facto standard, IIUC. I just added it to our prototype stack, and will begin trying it out soon. Note that you can only do standalone or Mesos; YARN isn't quite there yet.

(The repo just moved from https://github.com/ooyala/spark-jobserver, so don't trust Google on this one (yet); development is happening in the first repo.)



On Wed, Sep 3, 2014 at 11:39 PM, Vicky Kak <[hidden email]> wrote:
I have been able to submit the spark jobs using the submit script but I would like to do it via code.
I am unable to search anything matching to my need.
I am thinking of using org.apache.spark.deploy.SparkSubmit to do so, may be have to write some utility that passes the parameters required for this class.
I would be interested to know how community is doing.

Thanks,
Vicky


Reply | Threaded
Open this post in threaded view
|

Re: Programatically running of the Spark Jobs.

Vicky Kak
I don't want to use YARN or Mesos, just trying the standalone spark cluster.
We need a way to do seamless submission with the API which I don't see.
To my surprise I was hit by this issue when i tried running the submit from another machine, it is crazy that I have to submit the job from the worked node or play with the envirnments variables. It is the seamless
http://apache-spark-user-list.1001560.n3.nabble.com/executor-failed-cannot-find-compute-classpath-sh-td859.html


On Fri, Sep 5, 2014 at 8:33 AM, Guru Medasani <[hidden email]> wrote:
I am able to run Spark jobs and Spark Streaming jobs successfully via YARN on a CDH cluster. 

When you mean YARN isn’t quite there yet, you mean to submit the jobs programmatically? or just in general?
 

On Sep 4, 2014, at 1:45 AM, Matt Chu <[hidden email]> wrote:


Ooyala's Spark jobserver is the current de facto standard, IIUC. I just added it to our prototype stack, and will begin trying it out soon. Note that you can only do standalone or Mesos; YARN isn't quite there yet.

(The repo just moved from https://github.com/ooyala/spark-jobserver, so don't trust Google on this one (yet); development is happening in the first repo.)



On Wed, Sep 3, 2014 at 11:39 PM, Vicky Kak <[hidden email]> wrote:
I have been able to submit the spark jobs using the submit script but I would like to do it via code.
I am unable to search anything matching to my need.
I am thinking of using org.apache.spark.deploy.SparkSubmit to do so, may be have to write some utility that passes the parameters required for this class.
I would be interested to know how community is doing.

Thanks,
Vicky



Reply | Threaded
Open this post in threaded view
|

Re: Programatically running of the Spark Jobs.

Vicky Kak
In reply to this post by ericacm
I get this error when i run it from IDE
***************************************************************************************
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Master removed our application: FAILED
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
    at akka.actor.ActorCell.invoke(ActorCell.scala:456)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
    at akka.dispatch.Mailbox.run(Mailbox.scala:219)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

***************************************************************************************


On Fri, Sep 5, 2014 at 7:35 AM, ericacm <[hidden email]> wrote:
Ahh - that probably explains an issue I am seeing.  I am a brand new user and
I tried running the SimpleApp class that is on the Quick Start page
(http://spark.apache.org/docs/latest/quick-start.html).

When I use conf.setMaster("local") then I can run the class directly from my
IDE.  But when I try to set the master to my standalone cluster using
conf.setMaster("spark://myhost:7077") and then run the class directly from
the IDE I got this error in the local application (running from the IDE):

14/09/01 10:56:04 ERROR scheduler.TaskSetManager: Task 0.0:0 failed 4 times;
aborting job
14/09/01 10:56:04 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0,
whose tasks have all completed, from pool
14/09/01 10:56:04 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
14/09/01 10:56:04 INFO client.AppClient$ClientActor: Executor updated:
app-20140901105546-0001/3 is now EXITED (Command exited with code 52)
14/09/01 10:56:04 INFO cluster.SparkDeploySchedulerBackend: Executor
app-20140901105546-0001/3 removed: Command exited with code 52
14/09/01 10:56:04 INFO scheduler.DAGScheduler: Failed to run count at
SimpleApp.scala:17
Exception in thread "main" 14/09/01 10:56:04 INFO
client.AppClient$ClientActor: Executor added: app-20140901105546-0001/4 on
worker-20140901105055-10.0.1.5-56156 (10.0.1.5:56156) with 8 cores
org.apache.spark.SparkException: Job aborted due to stage failure: Task
0.0:0 failed 4 times, most recent failure: TID 3 on host 10.0.1.5 failed for
unknown reason

and this error in the worker stderr:

14/09/01 10:55:54 ERROR Executor: Exception in task ID 1
java.lang.OutOfMemoryError: Java heap space
        at
org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java:183)
        at org.apache.hadoop.conf.Configuration.readFields(Configuration.java:2378)
        at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
        at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77)
        at
org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:42)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1004)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1872)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
        at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)

Which made no sense because I also gave the worker 1gb of heap and it was
trying to process a 4k README.md file.  I'm guessing it must have tried to
deserialize a bogus object because I was not submitting the job correctly
(via spark-submit or this spark-jobserver)?

Thanks,



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Programatically-running-of-the-Spark-Jobs-tp13426p13518.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]