Re: Running Spark jar on EC2

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Running Spark jar on EC2

Jeff Higgens
Thanks for the suggestions.

Unfortunately I am still unable to run my fat jar on EC2 (even using runExample, and SPARK_CLASSPATH is blank). Here is the full output:

root@ip-172-31-21-60 ~]$ java -jar Crunch-assembly-0.0.1.jar 
14/01/01 22:34:40 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
14/01/01 22:34:40 INFO spark.SparkEnv: Registering BlockManagerMaster
14/01/01 22:34:40 INFO storage.MemoryStore: MemoryStore started with capacity 1093.6 MB.
14/01/01 22:34:41 INFO storage.DiskStore: Created local directory at /tmp/spark-local-20140101223440-a6bb
14/01/01 22:34:41 INFO network.ConnectionManager: Bound socket to port 56274 with id = ConnectionManagerId(ip-172-31-21-60,56274)
14/01/01 22:34:41 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/01/01 22:34:41 INFO storage.BlockManagerMaster: Registered BlockManager
14/01/01 22:34:41 INFO server.Server: jetty-7.x.y-SNAPSHOT
14/01/01 22:34:41 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:46111
14/01/01 22:34:41 INFO broadcast.HttpBroadcast: Broadcast server started at http://172.31.21.60:46111
14/01/01 22:34:41 INFO spark.SparkEnv: Registering MapOutputTracker
14/01/01 22:34:41 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-227ad744-5d0d-4e1a-aacd-9c0c73876b31
14/01/01 22:34:41 INFO server.Server: jetty-7.x.y-SNAPSHOT
14/01/01 22:34:41 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:44012
14/01/01 22:34:41 INFO io.IoWorker: IoWorker thread 'spray-io-worker-0' started
14/01/01 22:34:41 INFO server.HttpServer: akka://spark/user/BlockManagerHTTPServer started on /0.0.0.0:45098
14/01/01 22:34:41 INFO storage.BlockManagerUI: Started BlockManager web UI at http://ip-172-31-21-60:45098
14/01/01 22:34:42 INFO spark.SparkContext: Added JAR /root/Crunch-assembly-0.0.1.jar at http://172.31.21.60:44012/jars/Crunch-assembly-0.0.1.jar with timestamp 1388615682294
14/01/01 22:34:42 INFO client.Client$ClientActor: Connecting to master spark://ec2-54-193-16-137.us-west-1.compute.amazonaws.com:7077
14/01/01 22:34:42 ERROR client.Client$ClientActor: Connection to master failed; stopping client
14/01/01 22:34:42 ERROR cluster.SparkDeploySchedulerBackend: Disconnected from Spark cluster!
14/01/01 22:34:42 ERROR cluster.ClusterScheduler: Exiting due to error from cluster scheduler: Disconnected from Spark cluster


Interestingly, running one of the examples (SparkPi) works fine. The only thing that looked different from the output of SparkPi was this line:
14/01/01 23:27:55 INFO network.ConnectionManager: Bound socket to port 41806 with id = ConnectionManagerId(ip-172-31-29-197.us-west-1.compute.internal,41806)

Whereas my (not working) jar looked like this on that line:
14/01/01 22:34:41 INFO network.ConnectionManager: Bound socket to port 56274 with id = ConnectionManagerId(ip-172-31-21-60,56274)


On Fri, Dec 20, 2013 at 8:54 PM, Evan Sparks <[hidden email]> wrote:
I ran into a similar issue a few months back - pay careful attention to the order in which spark decides to look for your jars. The root of my problem was a stale jar in SPARK_CLASSPATH on the worker nodes, which took precedence (IIRC) over jars passed in with the SparkContext constructor. 

On Dec 20, 2013, at 8:49 PM, "K. Shankari" <[hidden email]> wrote:

I don't think that you need to copy the jar to the rest of the cluster - you should be able to do addJar() in the SparkContext and spark should automatically push the jars to the client for you.

I don't know how set you are on running code through checking out and compiling, but here's what I do instead to get my own application to run:
- compile my code on my desktop and generate a jar
- scp the jar to the master
- modify runExample to include the jar in the classpath. I think that you can also just modify SPARK_CLASSPATH
- run using something like:

$ runExample my.class.name arg1 arg2 arg3

Hope this helps!
Shankari


On Tue, Dec 10, 2013 at 12:15 PM, Jeff Higgens <[hidden email]> wrote:
I'm having trouble running my Spark program as a "fat jar" on EC2.

This is the process I'm using:
(1) spark-ec2 script to launch cluster
(2) ssh to master, install sbt and git clone my project's source code
(3) update source to reference correct master and jar
(4) sbt assembly
(5) copy-dir to copy the jar to the rest of the cluster

I tried both running the jar (java -jar ...) and using sbt run, but I always end up with this error:

18:58:59.556 [spark-akka.actor.default-dispatcher-4] INFO  o.a.s.d.client.Client$ClientActor - Connecting to master spark://ec2-50-16-80-0.compute-1.amazonaws.com:7077
18:58:59.838 [spark-akka.actor.default-dispatcher-4] ERROR o.a.s.d.client.Client$ClientActor - Connection to master failed; stopping client
18:58:59.839 [spark-akka.actor.default-dispatcher-4] ERROR o.a.s.s.c.SparkDeploySchedulerBackend - Disconnected from Spark cluster!
18:58:59.840 [spark-akka.actor.default-dispatcher-4] ERROR o.a.s.s.cluster.ClusterScheduler - Exiting due to error from cluster scheduler: Disconnected from Spark cluster
18:58:59.844 [delete Spark local dirs] DEBUG org.apache.spark.storage.DiskStore - Shutdown hook called


But when I use spark-shell it has no problems connecting to the master using the exact same url: 

13/12/10 18:59:40 INFO client.Client$ClientActor: Connecting to master spark://ec2-50-16-80-0.compute-1.amazonaws.com:7077
Spark context available as sc.

I'm probably missing something obvious so any tips are very appreciated.


Reply | Threaded
Open this post in threaded view
|

Re: Running Spark jar on EC2

Jeff Higgens
Ok, the problem was a very silly mistake.

I launched my EC2 instances using spark-0.8.1-incubating, but my fat jar was still being compiled with spark-0.7.3. Oops!


On Wed, Jan 1, 2014 at 3:36 PM, Jeff Higgens <[hidden email]> wrote:
Thanks for the suggestions.

Unfortunately I am still unable to run my fat jar on EC2 (even using runExample, and SPARK_CLASSPATH is blank). Here is the full output:

root@ip-172-31-21-60 ~]$ java -jar Crunch-assembly-0.0.1.jar 
14/01/01 22:34:40 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
14/01/01 22:34:40 INFO spark.SparkEnv: Registering BlockManagerMaster
14/01/01 22:34:40 INFO storage.MemoryStore: MemoryStore started with capacity 1093.6 MB.
14/01/01 22:34:41 INFO storage.DiskStore: Created local directory at /tmp/spark-local-20140101223440-a6bb
14/01/01 22:34:41 INFO network.ConnectionManager: Bound socket to port 56274 with id = ConnectionManagerId(ip-172-31-21-60,56274)
14/01/01 22:34:41 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/01/01 22:34:41 INFO storage.BlockManagerMaster: Registered BlockManager
14/01/01 22:34:41 INFO server.Server: jetty-7.x.y-SNAPSHOT
14/01/01 22:34:41 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:46111
14/01/01 22:34:41 INFO broadcast.HttpBroadcast: Broadcast server started at http://172.31.21.60:46111
14/01/01 22:34:41 INFO spark.SparkEnv: Registering MapOutputTracker
14/01/01 22:34:41 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-227ad744-5d0d-4e1a-aacd-9c0c73876b31
14/01/01 22:34:41 INFO server.Server: jetty-7.x.y-SNAPSHOT
14/01/01 22:34:41 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:44012
14/01/01 22:34:41 INFO io.IoWorker: IoWorker thread 'spray-io-worker-0' started
14/01/01 22:34:41 INFO server.HttpServer: akka://spark/user/BlockManagerHTTPServer started on /0.0.0.0:45098
14/01/01 22:34:41 INFO storage.BlockManagerUI: Started BlockManager web UI at http://ip-172-31-21-60:45098
14/01/01 22:34:42 INFO spark.SparkContext: Added JAR /root/Crunch-assembly-0.0.1.jar at http://172.31.21.60:44012/jars/Crunch-assembly-0.0.1.jar with timestamp 1388615682294
14/01/01 22:34:42 INFO client.Client$ClientActor: Connecting to master spark://ec2-54-193-16-137.us-west-1.compute.amazonaws.com:7077
14/01/01 22:34:42 ERROR client.Client$ClientActor: Connection to master failed; stopping client
14/01/01 22:34:42 ERROR cluster.SparkDeploySchedulerBackend: Disconnected from Spark cluster!
14/01/01 22:34:42 ERROR cluster.ClusterScheduler: Exiting due to error from cluster scheduler: Disconnected from Spark cluster


Interestingly, running one of the examples (SparkPi) works fine. The only thing that looked different from the output of SparkPi was this line:
14/01/01 23:27:55 INFO network.ConnectionManager: Bound socket to port 41806 with id = ConnectionManagerId(ip-172-31-29-197.us-west-1.compute.internal,41806)

Whereas my (not working) jar looked like this on that line:
14/01/01 22:34:41 INFO network.ConnectionManager: Bound socket to port 56274 with id = ConnectionManagerId(ip-172-31-21-60,56274)


On Fri, Dec 20, 2013 at 8:54 PM, Evan Sparks <[hidden email]> wrote:
I ran into a similar issue a few months back - pay careful attention to the order in which spark decides to look for your jars. The root of my problem was a stale jar in SPARK_CLASSPATH on the worker nodes, which took precedence (IIRC) over jars passed in with the SparkContext constructor. 

On Dec 20, 2013, at 8:49 PM, "K. Shankari" <[hidden email]> wrote:

I don't think that you need to copy the jar to the rest of the cluster - you should be able to do addJar() in the SparkContext and spark should automatically push the jars to the client for you.

I don't know how set you are on running code through checking out and compiling, but here's what I do instead to get my own application to run:
- compile my code on my desktop and generate a jar
- scp the jar to the master
- modify runExample to include the jar in the classpath. I think that you can also just modify SPARK_CLASSPATH
- run using something like:

$ runExample my.class.name arg1 arg2 arg3

Hope this helps!
Shankari


On Tue, Dec 10, 2013 at 12:15 PM, Jeff Higgens <[hidden email]> wrote:
I'm having trouble running my Spark program as a "fat jar" on EC2.

This is the process I'm using:
(1) spark-ec2 script to launch cluster
(2) ssh to master, install sbt and git clone my project's source code
(3) update source to reference correct master and jar
(4) sbt assembly
(5) copy-dir to copy the jar to the rest of the cluster

I tried both running the jar (java -jar ...) and using sbt run, but I always end up with this error:

18:58:59.556 [spark-akka.actor.default-dispatcher-4] INFO  o.a.s.d.client.Client$ClientActor - Connecting to master spark://ec2-50-16-80-0.compute-1.amazonaws.com:7077
18:58:59.838 [spark-akka.actor.default-dispatcher-4] ERROR o.a.s.d.client.Client$ClientActor - Connection to master failed; stopping client
18:58:59.839 [spark-akka.actor.default-dispatcher-4] ERROR o.a.s.s.c.SparkDeploySchedulerBackend - Disconnected from Spark cluster!
18:58:59.840 [spark-akka.actor.default-dispatcher-4] ERROR o.a.s.s.cluster.ClusterScheduler - Exiting due to error from cluster scheduler: Disconnected from Spark cluster
18:58:59.844 [delete Spark local dirs] DEBUG org.apache.spark.storage.DiskStore - Shutdown hook called


But when I use spark-shell it has no problems connecting to the master using the exact same url: 

13/12/10 18:59:40 INFO client.Client$ClientActor: Connecting to master spark://ec2-50-16-80-0.compute-1.amazonaws.com:7077
Spark context available as sc.

I'm probably missing something obvious so any tips are very appreciated.



Reply | Threaded
Open this post in threaded view
|

Re: Running Spark jar on EC2

Berkeley Malagon
Thanks for sharing the explanation. 

On Jan 1, 2014, at 4:19 PM, Jeff Higgens <[hidden email]> wrote:

Ok, the problem was a very silly mistake.

I launched my EC2 instances using spark-0.8.1-incubating, but my fat jar was still being compiled with spark-0.7.3. Oops!


On Wed, Jan 1, 2014 at 3:36 PM, Jeff Higgens <[hidden email]> wrote:
Thanks for the suggestions.

Unfortunately I am still unable to run my fat jar on EC2 (even using runExample, and SPARK_CLASSPATH is blank). Here is the full output:

root@ip-172-31-21-60 ~]$ java -jar Crunch-assembly-0.0.1.jar 
14/01/01 22:34:40 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
14/01/01 22:34:40 INFO spark.SparkEnv: Registering BlockManagerMaster
14/01/01 22:34:40 INFO storage.MemoryStore: MemoryStore started with capacity 1093.6 MB.
14/01/01 22:34:41 INFO storage.DiskStore: Created local directory at /tmp/spark-local-20140101223440-a6bb
14/01/01 22:34:41 INFO network.ConnectionManager: Bound socket to port 56274 with id = ConnectionManagerId(ip-172-31-21-60,56274)
14/01/01 22:34:41 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/01/01 22:34:41 INFO storage.BlockManagerMaster: Registered BlockManager
14/01/01 22:34:41 INFO server.Server: jetty-7.x.y-SNAPSHOT
14/01/01 22:34:41 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:46111
14/01/01 22:34:41 INFO broadcast.HttpBroadcast: Broadcast server started at http://172.31.21.60:46111
14/01/01 22:34:41 INFO spark.SparkEnv: Registering MapOutputTracker
14/01/01 22:34:41 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-227ad744-5d0d-4e1a-aacd-9c0c73876b31
14/01/01 22:34:41 INFO server.Server: jetty-7.x.y-SNAPSHOT
14/01/01 22:34:41 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:44012
14/01/01 22:34:41 INFO io.IoWorker: IoWorker thread 'spray-io-worker-0' started
14/01/01 22:34:41 INFO server.HttpServer: akka://spark/user/BlockManagerHTTPServer started on /0.0.0.0:45098
14/01/01 22:34:41 INFO storage.BlockManagerUI: Started BlockManager web UI at http://ip-172-31-21-60:45098
14/01/01 22:34:42 INFO spark.SparkContext: Added JAR /root/Crunch-assembly-0.0.1.jar at http://172.31.21.60:44012/jars/Crunch-assembly-0.0.1.jar with timestamp 1388615682294
14/01/01 22:34:42 INFO client.Client$ClientActor: Connecting to master spark://ec2-54-193-16-137.us-west-1.compute.amazonaws.com:7077
14/01/01 22:34:42 ERROR client.Client$ClientActor: Connection to master failed; stopping client
14/01/01 22:34:42 ERROR cluster.SparkDeploySchedulerBackend: Disconnected from Spark cluster!
14/01/01 22:34:42 ERROR cluster.ClusterScheduler: Exiting due to error from cluster scheduler: Disconnected from Spark cluster


Interestingly, running one of the examples (SparkPi) works fine. The only thing that looked different from the output of SparkPi was this line:
14/01/01 23:27:55 INFO network.ConnectionManager: Bound socket to port 41806 with id = ConnectionManagerId(ip-172-31-29-197.us-west-1.compute.internal,41806)

Whereas my (not working) jar looked like this on that line:
14/01/01 22:34:41 INFO network.ConnectionManager: Bound socket to port 56274 with id = ConnectionManagerId(ip-172-31-21-60,56274)


On Fri, Dec 20, 2013 at 8:54 PM, Evan Sparks <[hidden email]> wrote:
I ran into a similar issue a few months back - pay careful attention to the order in which spark decides to look for your jars. The root of my problem was a stale jar in SPARK_CLASSPATH on the worker nodes, which took precedence (IIRC) over jars passed in with the SparkContext constructor. 

On Dec 20, 2013, at 8:49 PM, "K. Shankari" <[hidden email]> wrote:

I don't think that you need to copy the jar to the rest of the cluster - you should be able to do addJar() in the SparkContext and spark should automatically push the jars to the client for you.

I don't know how set you are on running code through checking out and compiling, but here's what I do instead to get my own application to run:
- compile my code on my desktop and generate a jar
- scp the jar to the master
- modify runExample to include the jar in the classpath. I think that you can also just modify SPARK_CLASSPATH
- run using something like:

$ runExample my.class.name arg1 arg2 arg3

Hope this helps!
Shankari


On Tue, Dec 10, 2013 at 12:15 PM, Jeff Higgens <[hidden email]> wrote:
I'm having trouble running my Spark program as a "fat jar" on EC2.

This is the process I'm using:
(1) spark-ec2 script to launch cluster
(2) ssh to master, install sbt and git clone my project's source code
(3) update source to reference correct master and jar
(4) sbt assembly
(5) copy-dir to copy the jar to the rest of the cluster

I tried both running the jar (java -jar ...) and using sbt run, but I always end up with this error:

18:58:59.556 [spark-akka.actor.default-dispatcher-4] INFO  o.a.s.d.client.Client$ClientActor - Connecting to master spark://ec2-50-16-80-0.compute-1.amazonaws.com:7077
18:58:59.838 [spark-akka.actor.default-dispatcher-4] ERROR o.a.s.d.client.Client$ClientActor - Connection to master failed; stopping client
18:58:59.839 [spark-akka.actor.default-dispatcher-4] ERROR o.a.s.s.c.SparkDeploySchedulerBackend - Disconnected from Spark cluster!
18:58:59.840 [spark-akka.actor.default-dispatcher-4] ERROR o.a.s.s.cluster.ClusterScheduler - Exiting due to error from cluster scheduler: Disconnected from Spark cluster
18:58:59.844 [delete Spark local dirs] DEBUG org.apache.spark.storage.DiskStore - Shutdown hook called


But when I use spark-shell it has no problems connecting to the master using the exact same url: 

13/12/10 18:59:40 INFO client.Client$ClientActor: Connecting to master spark://ec2-50-16-80-0.compute-1.amazonaws.com:7077
Spark context available as sc.

I'm probably missing something obvious so any tips are very appreciated.