debug standalone Spark jobs?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

debug standalone Spark jobs?

Nan Zhu
Hi, all

I’m trying to run a standalone job in a Spark cluster on EC2, 

obviously there is some bug in my code, after the job runs for several minutes, it failed with an exception 

Loading /usr/share/sbt/bin/sbt-launch-lib.bash

[info] Set current project to rec_system (in build file:/home/ubuntu/rec_sys/)

[info] Running general.NetflixRecommender algorithm.SparkALS -b 20 -i 20 -l 0.005 -m spark://172.31.32.76:7077 --moviepath s3n://trainingset/netflix/training_set/* -o s3n://training_set/netflix/training_set/output.txt --rank 20 -r s3n://trainingset/netflix/training_set/mv_*

log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jEventHandler).

log4j:WARN Please initialize the log4j system properly.

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

failed to init the engine class

org.apache.spark.SparkException: Job aborted: Task 43.0:9 failed more than 4 times

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)

at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)

at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)

at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)

at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)





However, this information does not mean anything to me, how can I print out the detailed log information in console

I’m not sure about the reasons of those WARNs from log4j, I received the same WARNING when I run spark-shell, while in there, I can see detailed information like which task is running, etc.

Best,

-- 
Nan Zhu

Reply | Threaded
Open this post in threaded view
|

Re: debug standalone Spark jobs?

Sriram Ramachandrasekaran
Did you get to look at the spark worker logs? They would be at SPARK_HOME/logs/
Also, you should look at the application logs itself. They would be under SPARK_HOME/work/APP_ID



On Sun, Jan 5, 2014 at 8:36 PM, Nan Zhu <[hidden email]> wrote:
Hi, all

I’m trying to run a standalone job in a Spark cluster on EC2, 

obviously there is some bug in my code, after the job runs for several minutes, it failed with an exception 

Loading /usr/share/sbt/bin/sbt-launch-lib.bash

[info] Set current project to rec_system (in build file:/home/ubuntu/rec_sys/)

[info] Running general.NetflixRecommender algorithm.SparkALS -b 20 -i 20 -l 0.005 -m spark://172.31.32.76:7077 --moviepath s3n://trainingset/netflix/training_set/* -o s3n://training_set/netflix/training_set/output.txt --rank 20 -r s3n://trainingset/netflix/training_set/mv_*

log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jEventHandler).

log4j:WARN Please initialize the log4j system properly.

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

failed to init the engine class

org.apache.spark.SparkException: Job aborted: Task 43.0:9 failed more than 4 times

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)

at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)

at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)

at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)

at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)





However, this information does not mean anything to me, how can I print out the detailed log information in console

I’m not sure about the reasons of those WARNs from log4j, I received the same WARNING when I run spark-shell, while in there, I can see detailed information like which task is running, etc.

Best,

-- 
Nan Zhu




--
It's just about how deep your longing is!
Reply | Threaded
Open this post in threaded view
|

Re: debug standalone Spark jobs?

Nan Zhu
Ah, yes, I think application logs really help

Thank you

-- 
Nan Zhu

On Sunday, January 5, 2014 at 10:13 AM, Sriram Ramachandrasekaran wrote:

Did you get to look at the spark worker logs? They would be at SPARK_HOME/logs/
Also, you should look at the application logs itself. They would be under SPARK_HOME/work/APP_ID



On Sun, Jan 5, 2014 at 8:36 PM, Nan Zhu <[hidden email]> wrote:
Hi, all

I’m trying to run a standalone job in a Spark cluster on EC2, 

obviously there is some bug in my code, after the job runs for several minutes, it failed with an exception 

Loading /usr/share/sbt/bin/sbt-launch-lib.bash

[info] Set current project to rec_system (in build file:/home/ubuntu/rec_sys/)

[info] Running general.NetflixRecommender algorithm.SparkALS -b 20 -i 20 -l 0.005 -m spark://172.31.32.76:7077 --moviepath s3n://trainingset/netflix/training_set/* -o s3n://training_set/netflix/training_set/output.txt --rank 20 -r s3n://trainingset/netflix/training_set/mv_*

log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jEventHandler).

log4j:WARN Please initialize the log4j system properly.

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

failed to init the engine class

org.apache.spark.SparkException: Job aborted: Task 43.0:9 failed more than 4 times

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)

at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)

at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)

at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)

at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)





However, this information does not mean anything to me, how can I print out the detailed log information in console

I’m not sure about the reasons of those WARNs from log4j, I received the same WARNING when I run spark-shell, while in there, I can see detailed information like which task is running, etc.

Best,

-- 
Nan Zhu




--
It's just about how deep your longing is!

Reply | Threaded
Open this post in threaded view
|

Re: debug standalone Spark jobs?

Archit Thakur
You can run your spark application locally by setting SPARK_MASTER="local" and then debug the launched jvm in your IDE.


On Sun, Jan 5, 2014 at 9:04 PM, Nan Zhu <[hidden email]> wrote:
Ah, yes, I think application logs really help

Thank you

-- 
Nan Zhu

On Sunday, January 5, 2014 at 10:13 AM, Sriram Ramachandrasekaran wrote:

Did you get to look at the spark worker logs? They would be at SPARK_HOME/logs/
Also, you should look at the application logs itself. They would be under SPARK_HOME/work/APP_ID



On Sun, Jan 5, 2014 at 8:36 PM, Nan Zhu <[hidden email]> wrote:
Hi, all

I’m trying to run a standalone job in a Spark cluster on EC2, 

obviously there is some bug in my code, after the job runs for several minutes, it failed with an exception 

Loading /usr/share/sbt/bin/sbt-launch-lib.bash

[info] Set current project to rec_system (in build file:/home/ubuntu/rec_sys/)

[info] Running general.NetflixRecommender algorithm.SparkALS -b 20 -i 20 -l 0.005 -m spark://172.31.32.76:7077 --moviepath s3n://trainingset/netflix/training_set/* -o s3n://training_set/netflix/training_set/output.txt --rank 20 -r s3n://trainingset/netflix/training_set/mv_*

log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jEventHandler).

log4j:WARN Please initialize the log4j system properly.

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

failed to init the engine class

org.apache.spark.SparkException: Job aborted: Task 43.0:9 failed more than 4 times

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)

at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)

at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)

at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)

at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)





However, this information does not mean anything to me, how can I print out the detailed log information in console

I’m not sure about the reasons of those WARNs from log4j, I received the same WARNING when I run spark-shell, while in there, I can see detailed information like which task is running, etc.

Best,

-- 
Nan Zhu




--
It's just about how deep your longing is!


Reply | Threaded
Open this post in threaded view
|

Re: debug standalone Spark jobs?

Nan Zhu
Yes, but my problem only appears when on a large dataset, anyway, Thanks for the reply

Best,

-- 
Nan Zhu

On Sunday, January 5, 2014 at 11:09 AM, Archit Thakur wrote:

You can run your spark application locally by setting SPARK_MASTER="local" and then debug the launched jvm in your IDE.


On Sun, Jan 5, 2014 at 9:04 PM, Nan Zhu <[hidden email]> wrote:
Ah, yes, I think application logs really help

Thank you

-- 
Nan Zhu

On Sunday, January 5, 2014 at 10:13 AM, Sriram Ramachandrasekaran wrote:

Did you get to look at the spark worker logs? They would be at SPARK_HOME/logs/
Also, you should look at the application logs itself. They would be under SPARK_HOME/work/APP_ID



On Sun, Jan 5, 2014 at 8:36 PM, Nan Zhu <[hidden email]> wrote:
Hi, all

I’m trying to run a standalone job in a Spark cluster on EC2, 

obviously there is some bug in my code, after the job runs for several minutes, it failed with an exception 

Loading /usr/share/sbt/bin/sbt-launch-lib.bash

[info] Set current project to rec_system (in build file:/home/ubuntu/rec_sys/)

[info] Running general.NetflixRecommender algorithm.SparkALS -b 20 -i 20 -l 0.005 -m spark://172.31.32.76:7077 --moviepath s3n://trainingset/netflix/training_set/* -o s3n://training_set/netflix/training_set/output.txt --rank 20 -r s3n://trainingset/netflix/training_set/mv_*

log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jEventHandler).

log4j:WARN Please initialize the log4j system properly.

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

failed to init the engine class

org.apache.spark.SparkException: Job aborted: Task 43.0:9 failed more than 4 times

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)

at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)

at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)

at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)

at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)





However, this information does not mean anything to me, how can I print out the detailed log information in console

I’m not sure about the reasons of those WARNs from log4j, I received the same WARNING when I run spark-shell, while in there, I can see detailed information like which task is running, etc.

Best,

-- 
Nan Zhu




--
It's just about how deep your longing is!



Reply | Threaded
Open this post in threaded view
|

Re: debug standalone Spark jobs?

Eugen Cepoi
You can set the log level to INFO, it looks like spark is logging applicative errors as INFO. When I have errors that I can reproduce only on live data, I am running a spark shell with my job in its classpath, then I debug & tweak things to find out what happens.


2014/1/5 Nan Zhu <[hidden email]>
Yes, but my problem only appears when on a large dataset, anyway, Thanks for the reply

Best,

-- 
Nan Zhu

On Sunday, January 5, 2014 at 11:09 AM, Archit Thakur wrote:

You can run your spark application locally by setting SPARK_MASTER="local" and then debug the launched jvm in your IDE.


On Sun, Jan 5, 2014 at 9:04 PM, Nan Zhu <[hidden email]> wrote:
Ah, yes, I think application logs really help

Thank you

-- 
Nan Zhu

On Sunday, January 5, 2014 at 10:13 AM, Sriram Ramachandrasekaran wrote:

Did you get to look at the spark worker logs? They would be at SPARK_HOME/logs/
Also, you should look at the application logs itself. They would be under SPARK_HOME/work/APP_ID



On Sun, Jan 5, 2014 at 8:36 PM, Nan Zhu <[hidden email]> wrote:
Hi, all

I’m trying to run a standalone job in a Spark cluster on EC2, 

obviously there is some bug in my code, after the job runs for several minutes, it failed with an exception 

Loading /usr/share/sbt/bin/sbt-launch-lib.bash

[info] Set current project to rec_system (in build file:/home/ubuntu/rec_sys/)

[info] Running general.NetflixRecommender algorithm.SparkALS -b 20 -i 20 -l 0.005 -m spark://172.31.32.76:7077 --moviepath s3n://trainingset/netflix/training_set/* -o s3n://training_set/netflix/training_set/output.txt --rank 20 -r s3n://trainingset/netflix/training_set/mv_*

log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jEventHandler).

log4j:WARN Please initialize the log4j system properly.

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

failed to init the engine class

org.apache.spark.SparkException: Job aborted: Task 43.0:9 failed more than 4 times

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)

at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)

at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)

at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)

at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)





However, this information does not mean anything to me, how can I print out the detailed log information in console

I’m not sure about the reasons of those WARNs from log4j, I received the same WARNING when I run spark-shell, while in there, I can see detailed information like which task is running, etc.

Best,

-- 
Nan Zhu




--
It's just about how deep your longing is!




Reply | Threaded
Open this post in threaded view
|

Re: debug standalone Spark jobs?

K. Shankari
I ran into a similar problem earlier. The issue is that spark does not actually depend on log4j any more. You need to manually add the dependency to your build system. For example, in sbt, I added the following to build.sbt

libraryDependencies += "org.slf4j" % "slf4j-log4j12" % "1.7.2"

After that, it generates at least info level logging.

Thanks to TD for the pointer.

Thanks,
Shankari



On Sun, Jan 5, 2014 at 8:36 PM, Nan Zhu <[hidden email]> wrote:
Hi, all

I’m trying to run a standalone job in a Spark cluster on EC2, 

obviously there is some bug in my code, after the job runs for several minutes, it failed with an exception 

Loading /usr/share/sbt/bin/sbt-launch-lib.bash

[info] Set current project to rec_system (in build file:/home/ubuntu/rec_sys/)

[info] Running general.NetflixRecommender algorithm.SparkALS -b 20 -i 20 -l 0.005 -m spark://172.31.32.76:7077 --moviepath s3n://trainingset/netflix/training_set/* -o s3n://training_set/netflix/training_set/output.txt --rank 20 -r s3n://trainingset/netflix/training_set/mv_*

log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jEventHandler).

log4j:WARN Please initialize the log4j system properly.

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

failed to init the engine class

org.apache.spark.SparkException: Job aborted: Task 43.0:9 failed more than 4 times

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)

at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)

at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)

at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)

at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)

at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)





However, this information does not mean anything to me, how can I print out the detailed log information in console

I’m not sure about the reasons of those WARNs from log4j, I received the same WARNING when I run spark-shell, while in there, I can see detailed information like which task is running, etc.

Best,

-- 
Nan Zhu




--
It's just about how deep your longing is!