Using an external jar in the driver, in yarn-standalone mode.

classic Classic list List threaded Threaded
7 messages Options
ht
Reply | Threaded
Open this post in threaded view
|

Using an external jar in the driver, in yarn-standalone mode.

ht
Hello,

I have been struggling for ages to use an external jar in my spark driver program, in yarn-standalone mode. I just want to use in my main program, outside the calls to spark functions, objects that are defined in another jar.

I tried to set SPARK_CLASSPATH, ADD_JAR, I tried to use --addJar in the spark-class arguments, I always end up with a "Class not found exception" when I want to use classes defined in my jar.

Any ideas?

Thanks a lot,
Reply | Threaded
Open this post in threaded view
|

Re: Using an external jar in the driver, in yarn-standalone mode.

Sandy Ryza
Hi Julien,

Have you called SparkContext#addJars?

-Sandy


On Tue, Mar 25, 2014 at 10:05 AM, Julien Carme <[hidden email]> wrote:
Hello,

I have been struggling for ages to use an external jar in my spark driver program, in yarn-standalone mode. I just want to use in my main program, outside the calls to spark functions, objects that are defined in another jar.

I tried to set SPARK_CLASSPATH, ADD_JAR, I tried to use --addJar in the spark-class arguments, I always end up with a "Class not found exception" when I want to use classes defined in my jar.

Any ideas?

Thanks a lot,

Reply | Threaded
Open this post in threaded view
|

Re: Using an external jar in the driver, in yarn-standalone mode.

Nathan Kronenfeld
by 'use ... my main program' I presume you mean you have a main function in a class file you want to use as your entry point.

SPARK_CLASSPATH, ADD_JAR, etc add your jars in on the master and the workers... but they don't on the client.
For that, you're just using ordinary, everyday java/scala - so it just has to be on the normal java classpath.

Could that be your issue?

          -Nathan



On Tue, Mar 25, 2014 at 2:18 PM, Sandy Ryza <[hidden email]> wrote:
Hi Julien,

Have you called SparkContext#addJars?

-Sandy


On Tue, Mar 25, 2014 at 10:05 AM, Julien Carme <[hidden email]> wrote:
Hello,

I have been struggling for ages to use an external jar in my spark driver program, in yarn-standalone mode. I just want to use in my main program, outside the calls to spark functions, objects that are defined in another jar.

I tried to set SPARK_CLASSPATH, ADD_JAR, I tried to use --addJar in the spark-class arguments, I always end up with a "Class not found exception" when I want to use classes defined in my jar.

Any ideas?

Thanks a lot,




--
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238
Email:  [hidden email]
ht
Reply | Threaded
Open this post in threaded view
|

Re: Using an external jar in the driver, in yarn-standalone mode.

ht
Thanks for your answer.

I am using 
bin/spark-class  org.apache.spark.deploy.yarn.Client --jar myjar.jar --class myclass ...

myclass in myjar.jar contains a main that initializes a SparkContext in yarn-standalone mode.

Then I am using some code that uses myotherjar.jar, but I do not execute it using the spark context or a RDD, so my understanding is that it is not excuted on yarn slaves, only on the yarn master.

I found no way to make my code being able to find myotherjar.jar. CLASSPATH is set by Spark (or Yarn?) before being executed on the Yarn Master, it is not set by me. It seems that the idea is to set SPARK_CLASSPATH and/or ADD_JAR and then these jars becomes automatically available in the Yarn Master but it did not work for me.

I tried also to use sc.addJar, it did not work either, but anyway it seems clear that this is used for dependancies in the code exectued on the slaves, not on the master. Tell me if I am wrong






2014-03-25 21:11 GMT+01:00 Nathan Kronenfeld <[hidden email]>:
by 'use ... my main program' I presume you mean you have a main function in a class file you want to use as your entry point.

SPARK_CLASSPATH, ADD_JAR, etc add your jars in on the master and the workers... but they don't on the client.
For that, you're just using ordinary, everyday java/scala - so it just has to be on the normal java classpath.

Could that be your issue?

          -Nathan



On Tue, Mar 25, 2014 at 2:18 PM, Sandy Ryza <[hidden email]> wrote:
Hi Julien,

Have you called SparkContext#addJars?

-Sandy


On Tue, Mar 25, 2014 at 10:05 AM, Julien Carme <[hidden email]> wrote:
Hello,

I have been struggling for ages to use an external jar in my spark driver program, in yarn-standalone mode. I just want to use in my main program, outside the calls to spark functions, objects that are defined in another jar.

I tried to set SPARK_CLASSPATH, ADD_JAR, I tried to use --addJar in the spark-class arguments, I always end up with a "Class not found exception" when I want to use classes defined in my jar.

Any ideas?

Thanks a lot,




--
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  <a href="tel:%2B1-416-203-3003%20x%20238" value="+14162033003" target="_blank">+1-416-203-3003 x 238
Email:  [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Using an external jar in the driver, in yarn-standalone mode.

alee526
Hi Julien,

The ADD_JAR doesn't work in the command line. I checked spark-class, and I couldn't find any Bash shell bringing in the variable ADD_JAR to the CLASSPATH.

Were you able to print out the properties and environment variables from the Web GUI?

localhost:4040

This should give you an idea what is included in the current Spark shell. The bin/spark-shell invokes bin/spark-class, and I don't see ADD_JAR in bin/spark-class as well.

Hi Sandy,

Does Spark automatically deploy the JAR for you on the DFS cache if Spark is running on cluster mode? I haven't got that far yet to deploy my own one-time JAR for testing. Just setup a local cluster for practice.



Date: Tue, 25 Mar 2014 23:13:58 +0100
Subject: Re: Using an external jar in the driver, in yarn-standalone mode.
From: [hidden email]
To: [hidden email]

Thanks for your answer.

I am using 
bin/spark-class  org.apache.spark.deploy.yarn.Client --jar myjar.jar --class myclass ...

myclass in myjar.jar contains a main that initializes a SparkContext in yarn-standalone mode.

Then I am using some code that uses myotherjar.jar, but I do not execute it using the spark context or a RDD, so my understanding is that it is not excuted on yarn slaves, only on the yarn master.

I found no way to make my code being able to find myotherjar.jar. CLASSPATH is set by Spark (or Yarn?) before being executed on the Yarn Master, it is not set by me. It seems that the idea is to set SPARK_CLASSPATH and/or ADD_JAR and then these jars becomes automatically available in the Yarn Master but it did not work for me.

I tried also to use sc.addJar, it did not work either, but anyway it seems clear that this is used for dependancies in the code exectued on the slaves, not on the master. Tell me if I am wrong






2014-03-25 21:11 GMT+01:00 Nathan Kronenfeld <[hidden email]>:
by 'use ... my main program' I presume you mean you have a main function in a class file you want to use as your entry point.

SPARK_CLASSPATH, ADD_JAR, etc add your jars in on the master and the workers... but they don't on the client.
For that, you're just using ordinary, everyday java/scala - so it just has to be on the normal java classpath.

Could that be your issue?

          -Nathan



On Tue, Mar 25, 2014 at 2:18 PM, Sandy Ryza <[hidden email]> wrote:
Hi Julien,

Have you called SparkContext#addJars?

-Sandy


On Tue, Mar 25, 2014 at 10:05 AM, Julien Carme <[hidden email]> wrote:
Hello,

I have been struggling for ages to use an external jar in my spark driver program, in yarn-standalone mode. I just want to use in my main program, outside the calls to spark functions, objects that are defined in another jar.

I tried to set SPARK_CLASSPATH, ADD_JAR, I tried to use --addJar in the spark-class arguments, I always end up with a "Class not found exception" when I want to use classes defined in my jar.

Any ideas?

Thanks a lot,




--
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238
Email:  [hidden email]

ht
Reply | Threaded
Open this post in threaded view
|

Re: Using an external jar in the driver, in yarn-standalone mode.

ht
Hello Andrew,

Thanks for the tip, I accessed the Classpath Entries on the yarn monitoring (in case of yarn it is not localhost:4040 but yarn_master:8088//proxy/[application_id]/environment). I saw that my jar was actually on the CLASSPATH and was available to my application.

I realized that I could not access my .jar because there were something wrong with it, it was only partially transfered to my cluster and was therefore not usable. I am confused.

Sorry, and thanks for your help.



2014-03-26 1:01 GMT+01:00 Andrew Lee <[hidden email]>:
Hi Julien,

The ADD_JAR doesn't work in the command line. I checked spark-class, and I couldn't find any Bash shell bringing in the variable ADD_JAR to the CLASSPATH.

Were you able to print out the properties and environment variables from the Web GUI?

localhost:4040

This should give you an idea what is included in the current Spark shell. The bin/spark-shell invokes bin/spark-class, and I don't see ADD_JAR in bin/spark-class as well.

Hi Sandy,

Does Spark automatically deploy the JAR for you on the DFS cache if Spark is running on cluster mode? I haven't got that far yet to deploy my own one-time JAR for testing. Just setup a local cluster for practice.



Date: Tue, 25 Mar 2014 23:13:58 +0100
Subject: Re: Using an external jar in the driver, in yarn-standalone mode.
From: [hidden email]
To: [hidden email]


Thanks for your answer.

I am using 
bin/spark-class  org.apache.spark.deploy.yarn.Client --jar myjar.jar --class myclass ...

myclass in myjar.jar contains a main that initializes a SparkContext in yarn-standalone mode.

Then I am using some code that uses myotherjar.jar, but I do not execute it using the spark context or a RDD, so my understanding is that it is not excuted on yarn slaves, only on the yarn master.

I found no way to make my code being able to find myotherjar.jar. CLASSPATH is set by Spark (or Yarn?) before being executed on the Yarn Master, it is not set by me. It seems that the idea is to set SPARK_CLASSPATH and/or ADD_JAR and then these jars becomes automatically available in the Yarn Master but it did not work for me.

I tried also to use sc.addJar, it did not work either, but anyway it seems clear that this is used for dependancies in the code exectued on the slaves, not on the master. Tell me if I am wrong






2014-03-25 21:11 GMT+01:00 Nathan Kronenfeld <[hidden email]>:
by 'use ... my main program' I presume you mean you have a main function in a class file you want to use as your entry point.

SPARK_CLASSPATH, ADD_JAR, etc add your jars in on the master and the workers... but they don't on the client.
For that, you're just using ordinary, everyday java/scala - so it just has to be on the normal java classpath.

Could that be your issue?

          -Nathan



On Tue, Mar 25, 2014 at 2:18 PM, Sandy Ryza <[hidden email]> wrote:
Hi Julien,

Have you called SparkContext#addJars?

-Sandy


On Tue, Mar 25, 2014 at 10:05 AM, Julien Carme <[hidden email]> wrote:
Hello,

I have been struggling for ages to use an external jar in my spark driver program, in yarn-standalone mode. I just want to use in my main program, outside the calls to spark functions, objects that are defined in another jar.

I tried to set SPARK_CLASSPATH, ADD_JAR, I tried to use --addJar in the spark-class arguments, I always end up with a "Class not found exception" when I want to use classes defined in my jar.

Any ideas?

Thanks a lot,




--
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238
Email:  [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Using an external jar in the driver, in yarn-standalone mode.

Sandy Ryza
Andrew,
Spark automatically deploys the jar on the DFS cache if it's included with the addJars option.  It then still needs to be SparkContext.addJar'd to get it to the executors.

-Sandy 


On Wed, Mar 26, 2014 at 6:14 AM, Julien Carme <[hidden email]> wrote:
Hello Andrew,

Thanks for the tip, I accessed the Classpath Entries on the yarn monitoring (in case of yarn it is not localhost:4040 but yarn_master:8088//proxy/[application_id]/environment). I saw that my jar was actually on the CLASSPATH and was available to my application.

I realized that I could not access my .jar because there were something wrong with it, it was only partially transfered to my cluster and was therefore not usable. I am confused.

Sorry, and thanks for your help.



2014-03-26 1:01 GMT+01:00 Andrew Lee <[hidden email]>:

Hi Julien,

The ADD_JAR doesn't work in the command line. I checked spark-class, and I couldn't find any Bash shell bringing in the variable ADD_JAR to the CLASSPATH.

Were you able to print out the properties and environment variables from the Web GUI?

localhost:4040

This should give you an idea what is included in the current Spark shell. The bin/spark-shell invokes bin/spark-class, and I don't see ADD_JAR in bin/spark-class as well.

Hi Sandy,

Does Spark automatically deploy the JAR for you on the DFS cache if Spark is running on cluster mode? I haven't got that far yet to deploy my own one-time JAR for testing. Just setup a local cluster for practice.



Date: Tue, 25 Mar 2014 23:13:58 +0100
Subject: Re: Using an external jar in the driver, in yarn-standalone mode.
From: [hidden email]
To: [hidden email]


Thanks for your answer.

I am using 
bin/spark-class  org.apache.spark.deploy.yarn.Client --jar myjar.jar --class myclass ...

myclass in myjar.jar contains a main that initializes a SparkContext in yarn-standalone mode.

Then I am using some code that uses myotherjar.jar, but I do not execute it using the spark context or a RDD, so my understanding is that it is not excuted on yarn slaves, only on the yarn master.

I found no way to make my code being able to find myotherjar.jar. CLASSPATH is set by Spark (or Yarn?) before being executed on the Yarn Master, it is not set by me. It seems that the idea is to set SPARK_CLASSPATH and/or ADD_JAR and then these jars becomes automatically available in the Yarn Master but it did not work for me.

I tried also to use sc.addJar, it did not work either, but anyway it seems clear that this is used for dependancies in the code exectued on the slaves, not on the master. Tell me if I am wrong






2014-03-25 21:11 GMT+01:00 Nathan Kronenfeld <[hidden email]>:
by 'use ... my main program' I presume you mean you have a main function in a class file you want to use as your entry point.

SPARK_CLASSPATH, ADD_JAR, etc add your jars in on the master and the workers... but they don't on the client.
For that, you're just using ordinary, everyday java/scala - so it just has to be on the normal java classpath.

Could that be your issue?

          -Nathan



On Tue, Mar 25, 2014 at 2:18 PM, Sandy Ryza <[hidden email]> wrote:
Hi Julien,

Have you called SparkContext#addJars?

-Sandy


On Tue, Mar 25, 2014 at 10:05 AM, Julien Carme <[hidden email]> wrote:
Hello,

I have been struggling for ages to use an external jar in my spark driver program, in yarn-standalone mode. I just want to use in my main program, outside the calls to spark functions, objects that are defined in another jar.

I tried to set SPARK_CLASSPATH, ADD_JAR, I tried to use --addJar in the spark-class arguments, I always end up with a "Class not found exception" when I want to use classes defined in my jar.

Any ideas?

Thanks a lot,




--
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238
Email:  [hidden email]