Spark on Yarn classpath problems

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark on Yarn classpath problems

Eric Kimbrel-2
I am trying to run spark version 0.8.1 on hadoop 2.2.0-cdh5.0.0-beta-1 with YARN.  

I am using YARN Client with yarn-standalone mode as described here http://spark.incubator.apache.org/docs/latest/running-on-yarn.html

For simplifying matters I’ll say my application code is all contained in application.jar and it additionally depends on on code in dependency.jar

I launch my spark application as follows:

SPARK_JAR=<SPARK_ASSEMBLY_JAR_FILE> ./spark-class org.apache.spark.deploy.yarn.Client \
  --jar application.jar \
  --class <My main class> \
  --args <app specific arguments> \
  --num-workers <NUMBER_OF_WORKER_MACHINES> \
  --master-memory <MEMORY_FOR_MASTER> \
  --worker-memory <MEMORY_PER_WORKER> \
  --worker-cores <CORES_PER_WORKER> \	
  --name <application_name> \
  --addJars dependency.jar


Yarn loads the job and starts to execute, but as the job runs it quickly dies on class not found exceptions for classes that are specified in dependency.jar.

As an attempted fix i tried including all of the dependencies into a single jar “application-with-dependencies.jar”  I specify this jar with —jar option and remove the —addJars line.  Unfortunately this did not alleviate the issue and the class not found exceptions continued.


Reply | Threaded
Open this post in threaded view
|

RE: Spark on Yarn classpath problems

Liu, Raymond
Not found in which part of code? If in sparkContext thread, say on AM, --addJars should work

If on tasks, then --addjars won't work, you need to use --file=local://xxx etc, not sure is it available in 0.8.1. And adding to a single jar should also work, if not works, might be something wrong with the assemble?

Best Regards,
Raymond Liu

From: Eric Kimbrel [mailto:[hidden email]]
Sent: Wednesday, January 08, 2014 11:16 AM
To: [hidden email]
Subject: Spark on Yarn classpath problems

I am trying to run spark version 0.8.1 on hadoop 2.2.0-cdh5.0.0-beta-1 with YARN.  

I am using YARN Client with yarn-standalone mode as described here http://spark.incubator.apache.org/docs/latest/running-on-yarn.html

For simplifying matters I'll say my application code is all contained in application.jar and it additionally depends on on code in dependency.jar

I launch my spark application as follows:
       

SPARK_JAR=<SPARK_ASSEMBLY_JAR_FILE> ./spark-class org.apache.spark.deploy.yarn.Client \
  --jar application.jar \
  --class <My main class> \
  --args <app specific arguments> \
  --num-workers <NUMBER_OF_WORKER_MACHINES> \
  --master-memory <MEMORY_FOR_MASTER> \
  --worker-memory <MEMORY_PER_WORKER> \
  --worker-cores <CORES_PER_WORKER> \
  --name <application_name> \
  --addJars dependency.jar


Yarn loads the job and starts to execute, but as the job runs it quickly dies on class not found exceptions for classes that are specified in dependency.jar.

As an attempted fix i tried including all of the dependencies into a single jar "application-with-dependencies.jar"  I specify this jar with -jar option and remove the -addJars line.  Unfortunately this did not alleviate the issue and the class not found exceptions continued.


Reply | Threaded
Open this post in threaded view
|

Re: Spark on Yarn classpath problems

Eric Kimbrel-2
Interesting.  I’ll inspect the assembly and take a look, but i have a couple of follow up questions,

1.  If the class is needed both in the sparkContext thread and the workers would i need to add it twice? with —addJars and with —file?
2.  with the —file method will i need to place the jar at that location on each node of the cluster or does the yarn client read the file and distribute it onto the cluster?

Thanks response.



On Jan 7, 2014, at 8:44 PM, Liu, Raymond <[hidden email]> wrote:

> Not found in which part of code? If in sparkContext thread, say on AM, --addJars should work
>
> If on tasks, then --addjars won't work, you need to use --file=local://xxx etc, not sure is it available in 0.8.1. And adding to a single jar should also work, if not works, might be something wrong with the assemble?
>
> Best Regards,
> Raymond Liu
>
> From: Eric Kimbrel [mailto:[hidden email]]
> Sent: Wednesday, January 08, 2014 11:16 AM
> To: [hidden email]
> Subject: Spark on Yarn classpath problems
>
> I am trying to run spark version 0.8.1 on hadoop 2.2.0-cdh5.0.0-beta-1 with YARN.  
>
> I am using YARN Client with yarn-standalone mode as described here http://spark.incubator.apache.org/docs/latest/running-on-yarn.html
>
> For simplifying matters I'll say my application code is all contained in application.jar and it additionally depends on on code in dependency.jar
>
> I launch my spark application as follows:
>
>
> SPARK_JAR=<SPARK_ASSEMBLY_JAR_FILE> ./spark-class org.apache.spark.deploy.yarn.Client \
>  --jar application.jar \
>  --class <My main class> \
>  --args <app specific arguments> \
>  --num-workers <NUMBER_OF_WORKER_MACHINES> \
>  --master-memory <MEMORY_FOR_MASTER> \
>  --worker-memory <MEMORY_PER_WORKER> \
>  --worker-cores <CORES_PER_WORKER> \
>  --name <application_name> \
>  --addJars dependency.jar
>
>
> Yarn loads the job and starts to execute, but as the job runs it quickly dies on class not found exceptions for classes that are specified in dependency.jar.
>
> As an attempted fix i tried including all of the dependencies into a single jar "application-with-dependencies.jar"  I specify this jar with -jar option and remove the -addJars line.  Unfortunately this did not alleviate the issue and the class not found exceptions continued.
>
>

Reply | Threaded
Open this post in threaded view
|

RE: Spark on Yarn classpath problems

Liu, Raymond

1. --files should be enough.
2. --files will read and distribute it onto the cluster. And you can also put the file on hdfs and point to it to save the time for uploading, though still need to be download to worker container ( will be done by yarn container automatically)


Best Regards,
Raymond Liu


-----Original Message-----
From: Eric Kimbrel [mailto:[hidden email]]

Interesting.  I'll inspect the assembly and take a look, but i have a couple of follow up questions,

1.  If the class is needed both in the sparkContext thread and the workers would i need to add it twice? with -addJars and with -file?
2.  with the -file method will i need to place the jar at that location on each node of the cluster or does the yarn client read the file and distribute it onto the cluster?

Thanks response.



On Jan 7, 2014, at 8:44 PM, Liu, Raymond <[hidden email]> wrote:

> Not found in which part of code? If in sparkContext thread, say on AM,
> --addJars should work
>
> If on tasks, then --addjars won't work, you need to use --file=local://xxx etc, not sure is it available in 0.8.1. And adding to a single jar should also work, if not works, might be something wrong with the assemble?
>
> Best Regards,
> Raymond Liu
>
> From: Eric Kimbrel [mailto:[hidden email]]
> Sent: Wednesday, January 08, 2014 11:16 AM
> To: [hidden email]
> Subject: Spark on Yarn classpath problems
>
> I am trying to run spark version 0.8.1 on hadoop 2.2.0-cdh5.0.0-beta-1 with YARN.  
>
> I am using YARN Client with yarn-standalone mode as described here
> http://spark.incubator.apache.org/docs/latest/running-on-yarn.html
>
> For simplifying matters I'll say my application code is all contained
> in application.jar and it additionally depends on on code in
> dependency.jar
>
> I launch my spark application as follows:
>
>
> SPARK_JAR=<SPARK_ASSEMBLY_JAR_FILE> ./spark-class
> org.apache.spark.deploy.yarn.Client \  --jar application.jar \  
> --class <My main class> \  --args <app specific arguments> \  
> --num-workers <NUMBER_OF_WORKER_MACHINES> \  --master-memory
> <MEMORY_FOR_MASTER> \  --worker-memory <MEMORY_PER_WORKER> \
>  --worker-cores <CORES_PER_WORKER> \
>  --name <application_name> \
>  --addJars dependency.jar
>
>
> Yarn loads the job and starts to execute, but as the job runs it quickly dies on class not found exceptions for classes that are specified in dependency.jar.
>
> As an attempted fix i tried including all of the dependencies into a single jar "application-with-dependencies.jar"  I specify this jar with -jar option and remove the -addJars line.  Unfortunately this did not alleviate the issue and the class not found exceptions continued.
>
>