Quantcast

Spark / YARN classpath issues

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Spark / YARN classpath issues

Jon Bender
Hey all,

I'm working through the basic SparkPi example on a YARN cluster, and i'm wondering why my containers don't pick up the spark assembly classes.

I built the latest spark code against CDH5.0.0

Then ran the following:
SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar \
      ./bin/spark-class org.apache.spark.deploy.yarn.Client \
      --jar examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar \
      --class org.apache.spark.examples.SparkPi \
      --args yarn-standalone \
      --num-workers 3 \
      --master-memory 4g \
      --worker-memory 2g \
      --worker-cores 1

The job dies, and in the stderr from the containers I see
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/ApplicationMaster
Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.ApplicationMaster
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)

my yarn-site.xml contains the following classpath:
  <property>
    <name>yarn.application.classpath</name>
    <value>
    /etc/hadoop/conf/,
    /usr/lib/hadoop/*,/usr/lib/hadoop//lib/*,
    /usr/lib/hadoop-hdfs/*,/user/lib/hadoop-hdfs/lib/*,
    /usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-mapreduce/lib/*,
    /usr/lib/hadoop-yarn/*,/usr/lib/hadoop-yarn/lib/*,
    /usr/lib/avro/*
    </value>
  </property>

I've confirmed that the spark-assembly JAR has this class.  Does it actually need to be defined in yarn.application.classpath or should the spark client take care of ensuring the necessary JARs are added during job submission?

Any tips would be greatly appreciated!
Cheers,
Jon
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark / YARN classpath issues

Andrew Or-2
Hi Jon,

Your configuration looks largely correct. I have very recently confirmed that the way you launch SparkPi also works for me.

I have run into the same problem a bunch of times. My best guess is that this is a Java version issue. If the Spark assembly jar is built with Java 7, it cannot be opened by Java 6 because the two versions use different packaging schemes. This is a known issue: https://issues.apache.org/jira/browse/SPARK-1520.

The workaround is to either make sure that all your executor nodes are running Java 7, and, very importantly, have JAVA_HOME point to this version. You can achieve this through

export SPARK_YARN_USER_ENV="JAVA_HOME=/path/to/java7/home"

in spark-env.sh. Another safe alternative, of course, is to just build the jar with Java 6. An additional debugging step is to review the launch environment of all the containers. This is detailed in the last paragraph of this section: http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/running-on-yarn.html#debugging-your-application. This may not be necessary, but I have personally found it immensely useful.

One last thing, launching Spark applications through org.apache.spark.deploy.yarn.Client is deprecated in Spark 1.0. You should use bin/spark-submit instead. You can find information about its usage on the docs I linked to you, or simply through the --help option.

Cheers,
Andrew


2014-05-22 11:38 GMT-07:00 Jon Bender <[hidden email]>:
Hey all,

I'm working through the basic SparkPi example on a YARN cluster, and i'm wondering why my containers don't pick up the spark assembly classes.

I built the latest spark code against CDH5.0.0

Then ran the following:
SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar \
      ./bin/spark-class org.apache.spark.deploy.yarn.Client \
      --jar examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar \
      --class org.apache.spark.examples.SparkPi \
      --args yarn-standalone \
      --num-workers 3 \
      --master-memory 4g \
      --worker-memory 2g \
      --worker-cores 1

The job dies, and in the stderr from the containers I see
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/ApplicationMaster
Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.ApplicationMaster
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)

my yarn-site.xml contains the following classpath:
  <property>
    <name>yarn.application.classpath</name>
    <value>
    /etc/hadoop/conf/,
    /usr/lib/hadoop/*,/usr/lib/hadoop//lib/*,
    /usr/lib/hadoop-hdfs/*,/user/lib/hadoop-hdfs/lib/*,
    /usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-mapreduce/lib/*,
    /usr/lib/hadoop-yarn/*,/usr/lib/hadoop-yarn/lib/*,
    /usr/lib/avro/*
    </value>
  </property>

I've confirmed that the spark-assembly JAR has this class.  Does it actually need to be defined in yarn.application.classpath or should the spark client take care of ensuring the necessary JARs are added during job submission?

Any tips would be greatly appreciated!
Cheers,
Jon

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark / YARN classpath issues

Jon Bender
Andrew,

Brilliant!  I built on Java 7 but was still running our cluster on Java 6.  Upgraded the cluster and it worked (with slight tweaks to the args, I guess the app args come first then yarn-standalone comes last):

SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar \
      ./bin/spark-class org.apache.spark.deploy.yarn.Client \
      --jar examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar \
      --class org.apache.spark.examples.SparkPi \
      --args 10 \
      --args yarn-standalone \
      --num-workers 3 \
      --master-memory 4g \
      --worker-memory 2g \
      --worker-cores 1

I'll make sure to use spark-submit from here on out.

Thanks very much!
Jon


On Thu, May 22, 2014 at 12:40 PM, Andrew Or <[hidden email]> wrote:
Hi Jon,

Your configuration looks largely correct. I have very recently confirmed that the way you launch SparkPi also works for me.

I have run into the same problem a bunch of times. My best guess is that this is a Java version issue. If the Spark assembly jar is built with Java 7, it cannot be opened by Java 6 because the two versions use different packaging schemes. This is a known issue: https://issues.apache.org/jira/browse/SPARK-1520.

The workaround is to either make sure that all your executor nodes are running Java 7, and, very importantly, have JAVA_HOME point to this version. You can achieve this through

export SPARK_YARN_USER_ENV="JAVA_HOME=/path/to/java7/home"

in spark-env.sh. Another safe alternative, of course, is to just build the jar with Java 6. An additional debugging step is to review the launch environment of all the containers. This is detailed in the last paragraph of this section: http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/running-on-yarn.html#debugging-your-application. This may not be necessary, but I have personally found it immensely useful.

One last thing, launching Spark applications through org.apache.spark.deploy.yarn.Client is deprecated in Spark 1.0. You should use bin/spark-submit instead. You can find information about its usage on the docs I linked to you, or simply through the --help option.

Cheers,
Andrew


2014-05-22 11:38 GMT-07:00 Jon Bender <[hidden email]>:

Hey all,

I'm working through the basic SparkPi example on a YARN cluster, and i'm wondering why my containers don't pick up the spark assembly classes.

I built the latest spark code against CDH5.0.0

Then ran the following:
SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar \
      ./bin/spark-class org.apache.spark.deploy.yarn.Client \
      --jar examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar \
      --class org.apache.spark.examples.SparkPi \
      --args yarn-standalone \
      --num-workers 3 \
      --master-memory 4g \
      --worker-memory 2g \
      --worker-cores 1

The job dies, and in the stderr from the containers I see
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/ApplicationMaster
Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.ApplicationMaster
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)

my yarn-site.xml contains the following classpath:
  <property>
    <name>yarn.application.classpath</name>
    <value>
    /etc/hadoop/conf/,
    /usr/lib/hadoop/*,/usr/lib/hadoop//lib/*,
    /usr/lib/hadoop-hdfs/*,/user/lib/hadoop-hdfs/lib/*,
    /usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-mapreduce/lib/*,
    /usr/lib/hadoop-yarn/*,/usr/lib/hadoop-yarn/lib/*,
    /usr/lib/avro/*
    </value>
  </property>

I've confirmed that the spark-assembly JAR has this class.  Does it actually need to be defined in yarn.application.classpath or should the spark client take care of ensuring the necessary JARs are added during job submission?

Any tips would be greatly appreciated!
Cheers,
Jon


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark / YARN classpath issues

Andrew Or-2
I think you should be able to drop "yarn-standalone" altogether. We recently updated SparkPi to take in 1 argument (num slices, which you set to 10). Previously, it took in 2 arguments, the master and num slices.

Glad you got it figured out.


2014-05-22 13:41 GMT-07:00 Jon Bender <[hidden email]>:
Andrew,

Brilliant!  I built on Java 7 but was still running our cluster on Java 6.  Upgraded the cluster and it worked (with slight tweaks to the args, I guess the app args come first then yarn-standalone comes last):

SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar \
      ./bin/spark-class org.apache.spark.deploy.yarn.Client \
      --jar examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar \
      --class org.apache.spark.examples.SparkPi \
      --args 10 \
      --args yarn-standalone \
      --num-workers 3 \
      --master-memory 4g \
      --worker-memory 2g \
      --worker-cores 1

I'll make sure to use spark-submit from here on out.

Thanks very much!
Jon


On Thu, May 22, 2014 at 12:40 PM, Andrew Or <[hidden email]> wrote:
Hi Jon,

Your configuration looks largely correct. I have very recently confirmed that the way you launch SparkPi also works for me.

I have run into the same problem a bunch of times. My best guess is that this is a Java version issue. If the Spark assembly jar is built with Java 7, it cannot be opened by Java 6 because the two versions use different packaging schemes. This is a known issue: https://issues.apache.org/jira/browse/SPARK-1520.

The workaround is to either make sure that all your executor nodes are running Java 7, and, very importantly, have JAVA_HOME point to this version. You can achieve this through

export SPARK_YARN_USER_ENV="JAVA_HOME=/path/to/java7/home"

in spark-env.sh. Another safe alternative, of course, is to just build the jar with Java 6. An additional debugging step is to review the launch environment of all the containers. This is detailed in the last paragraph of this section: http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/running-on-yarn.html#debugging-your-application. This may not be necessary, but I have personally found it immensely useful.

One last thing, launching Spark applications through org.apache.spark.deploy.yarn.Client is deprecated in Spark 1.0. You should use bin/spark-submit instead. You can find information about its usage on the docs I linked to you, or simply through the --help option.

Cheers,
Andrew


2014-05-22 11:38 GMT-07:00 Jon Bender <[hidden email]>:

Hey all,

I'm working through the basic SparkPi example on a YARN cluster, and i'm wondering why my containers don't pick up the spark assembly classes.

I built the latest spark code against CDH5.0.0

Then ran the following:
SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar \
      ./bin/spark-class org.apache.spark.deploy.yarn.Client \
      --jar examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar \
      --class org.apache.spark.examples.SparkPi \
      --args yarn-standalone \
      --num-workers 3 \
      --master-memory 4g \
      --worker-memory 2g \
      --worker-cores 1

The job dies, and in the stderr from the containers I see
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/ApplicationMaster
Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.ApplicationMaster
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)

my yarn-site.xml contains the following classpath:
  <property>
    <name>yarn.application.classpath</name>
    <value>
    /etc/hadoop/conf/,
    /usr/lib/hadoop/*,/usr/lib/hadoop//lib/*,
    /usr/lib/hadoop-hdfs/*,/user/lib/hadoop-hdfs/lib/*,
    /usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-mapreduce/lib/*,
    /usr/lib/hadoop-yarn/*,/usr/lib/hadoop-yarn/lib/*,
    /usr/lib/avro/*
    </value>
  </property>

I've confirmed that the spark-assembly JAR has this class.  Does it actually need to be defined in yarn.application.classpath or should the spark client take care of ensuring the necessary JARs are added during job submission?

Any tips would be greatly appreciated!
Cheers,
Jon



Loading...