Error when executing Spark application on YARN

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Error when executing Spark application on YARN

alvarobrandon
This post has NOT been accepted by the mailing list yet.
Hello:

I'm trying to launch an application in a yarn cluster with the following command


/opt/spark/bin/spark-submit --class com.abrandon.upm.GenerateKMeansData --master yarn --deploy-mode client /opt/spark/BenchMark-1.0-SNAPSHOT.jar kMeans 500000000 4 5 0.9 8

The last bit after the jar file are just the parameters of the GenerateKMeansData application. I get the following error

16/02/17 15:31:01 INFO Client: Application report for application_1455721308385_0005 (state: ACCEPTED)
16/02/17 15:31:02 INFO Client: Application report for application_1455721308385_0005 (state: FAILED)
16/02/17 15:31:02 INFO Client:
         client token: N/A
         diagnostics: Application application_1455721308385_0005 failed 2 times due to AM Container for appattempt_1455721308385_0005_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://stremi-17.reims.grid5000.fr:8088/proxy/application_1455721308385_0005/Then, click on links to logs of each attempt.
Diagnostics: File file:/tmp/spark-5a98e9d4-6f90-446d-9bec-f0d30bffae32/__spark_conf__2242504518276040137.zip does not exist
java.io.FileNotFoundException: File file:/tmp/spark-5a98e9d4-6f90-446d-9bec-f0d30bffae32/__spark_conf__2242504518276040137.zip does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
        at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
        at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Failing this attempt. Failing the application.
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1455723059732
         final status: FAILED
         tracking URL: http://stremi-17.reims.grid5000.fr:8088/cluster/app/application_1455721308385_0005
         user: abrandon
16/02/17 15:31:02 ERROR SparkContext: Error initializing SparkContext.

I think the important part is Diagnostics: File file:/tmp/spark-5a98e9d4-6f90-446d-9bec-f0d30bffae32/__spark_conf__2242504518276040137.zip does not exist. Does anybody know what that means?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Error when executing Spark application on YARN

nsalian
This post has NOT been accepted by the mailing list yet.
Hi,

Thanks for the question.

I do see this in the bottom:
16/02/17 15:31:02 ERROR SparkContext: Error initializing SparkContext.

Some questions to help get more understanding:
1) Does this happen to any other jobs?
2) Any changes to the Spark setup in recent time?
3) Could you open the  tracking URL: http://stremi-17.reims.grid5000.fr:8088/cluster/app/application_1455721308385_0005

and see if the container logs say anything beyond the stack trace that you pasted?
Neelesh S. Salian
Cloudera
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Error when executing Spark application on YARN

alvarobrandon
This post has NOT been accepted by the mailing list yet.
1. It happens to all the classes inside the jar package.
2. I didn't do any changes.
       - I have three nodes: one master and two slaves in the conf/slaves file
       - In spark-env.sh I just set the HADOOP_CONF_DIR parameter
       - In spark-defaults.conf I didn't change anything
3. The container doesn't even starts.

It seems like there is some problem when sending the jar files. I have just realised I get the following message.
Diagnostics: java.io.IOException: Resource file:/opt/spark/BenchMark-1.0-SNAPSHOT.jar changed on src filesystem (expected 1455792343000, was 1455793100000

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Error when executing Spark application on YARN

alvarobrandon
This post has NOT been accepted by the mailing list yet.
Previous message seems to be a problem with the timestamp of each file.Before I was copying the jar file to each slave node, so I left the jar only on the master node. I rerun the applications but now I get the following INFO messages:
16/02/18 11:22:58 INFO Client: Source and destination file systems are the same. Not copying file:/opt/spark/lib/spark-assembly-1.6.0-hadoop2.6.0.jar
16/02/18 11:22:59 INFO Client: Source and destination file systems are the same. Not copying file:/opt/spark/BenchMark-1.0-SNAPSHOT.jar
16/02/18 11:22:59 INFO Client: Source and destination file systems are the same. Not copying file:/tmp/spark-313ee0f0-6a30-4eb7-a3ce-b2a0deeff6f4/__spark_conf__8462363500845960489.zip

And the error:

Diagnostics: java.io.FileNotFoundException: File file:/opt/spark/BenchMark-1.0-SNAPSHOT.jar does not exist
Failing this attempt. Failing the application.
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1455794579804
         final status: FAILED
         tracking URL: http://stremi-14.reims.grid5000.fr:8088/cluster/app/application_1455792361051_0011
         user: abrandon
Exception in thread "main" org.apache.spark.SparkException: Application application_1455792361051_0011 finished with failed status

As far as I know the master should send the jar file to the slave nodes. How comes it cannot find the file?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Error when executing Spark application on YARN

alvarobrandon
This post has NOT been accepted by the mailing list yet.
Found the solution. I was pointing to the wrong hadoop conf directory. I feel so stupid :P
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Error when executing Spark application on YARN

laoyuzaici
This post has NOT been accepted by the mailing list yet.
I have this problem too . Could you tell me , Where the configuration of the problem, Thank you
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Error when executing Spark application on YARN

laoyuzaici
This post has NOT been accepted by the mailing list yet.
OK,I go back and look at my own Hadoop configuration´╝îI am so stupid
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Error when executing Spark application on YARN

smishra
This post has NOT been accepted by the mailing list yet.
In reply to this post by alvarobrandon
Make sure you don't have more than one core-site.xml. I had same issue and I found there were two core-site.xml one under $HADOOP_CONF_DIR and other $SPARK_HOME/conf. I removed the one under $SPARK_HOME/conf and the error disappeared.
Loading...