ADD_JARS doesn't properly work for spark-shell

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

ADD_JARS doesn't properly work for spark-shell

Aureliano Buendia
Hi,

I'm trying to access my stand alone spark app from spark-shell. I tried starting the shell by:

MASTER=local[2] ADD_JARS=/path/to/my/jar ./spark-shell

The log shows that the jar file was loaded. Also, I can access and create a new instance of mypackage.MyClass.

The problem is that myRDD.collect() returns RDD[MyClass], and that throws this exception:

java.lang.ClassNotFoundException: mypackage.MyClass
  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:264)
  at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:622)
  at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
  at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)
  at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1642)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
  at org.apache.spark.util.Utils$.deserialize(Utils.scala:59)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:702)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:698)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
  at org.apache.spark.scheduler.Task.run(Task.scala:53)
  at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)
  at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722)


Does this mean that my jar was not shipped to the workers? Is this a known issue, or am I doing something wrong here?
Reply | Threaded
Open this post in threaded view
|

Re: ADD_JARS doesn't properly work for spark-shell

Aureliano Buendia
I should add that I can see in the log that the jar being shipped to the workers:

14/01/04 15:34:52 INFO Executor: Fetching http://192.168.1.111:51031/jars/my.jar.jar with timestamp 1388881979092
14/01/04 15:34:52 INFO Utils: Fetching http://192.168.1.111:51031/jars/my.jar.jar to /var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/fetchFileTemp8322008964976744710.tmp
14/01/04 15:34:53 INFO Executor: Adding file:/var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/spark-d8ac8f66-fad6-4b3f-8059-73f13b86b070/my.jar.jar to class loader


On Sun, Jan 5, 2014 at 12:46 AM, Aureliano Buendia <[hidden email]> wrote:
Hi,

I'm trying to access my stand alone spark app from spark-shell. I tried starting the shell by:

MASTER=local[2] ADD_JARS=/path/to/my/jar ./spark-shell

The log shows that the jar file was loaded. Also, I can access and create a new instance of mypackage.MyClass.

The problem is that myRDD.collect() returns RDD[MyClass], and that throws this exception:

java.lang.ClassNotFoundException: mypackage.MyClass
  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:264)
  at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:622)
  at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
  at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)
  at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1642)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
  at org.apache.spark.util.Utils$.deserialize(Utils.scala:59)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:702)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:698)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
  at org.apache.spark.scheduler.Task.run(Task.scala:53)
  at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)
  at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722)


Does this mean that my jar was not shipped to the workers? Is this a known issue, or am I doing something wrong here?

Reply | Threaded
Open this post in threaded view
|

Re: ADD_JARS doesn't properly work for spark-shell

Aaron Davidson
I am not an expert on these classpath issues, but if you're using local mode, you might also try to set SPARK_CLASSPATH to include the path to the jar file as well. This should not really help, since "adding jars" is the right way to get the jars to your executors (which is where the exception appears to be happening), but it would sure be interesting if it did.


On Sat, Jan 4, 2014 at 4:50 PM, Aureliano Buendia <[hidden email]> wrote:
I should add that I can see in the log that the jar being shipped to the workers:

14/01/04 15:34:52 INFO Executor: Fetching http://192.168.1.111:51031/jars/my.jar.jar with timestamp 1388881979092
14/01/04 15:34:52 INFO Utils: Fetching http://192.168.1.111:51031/jars/my.jar.jar to /var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/fetchFileTemp8322008964976744710.tmp
14/01/04 15:34:53 INFO Executor: Adding file:/var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/spark-d8ac8f66-fad6-4b3f-8059-73f13b86b070/my.jar.jar to class loader


On Sun, Jan 5, 2014 at 12:46 AM, Aureliano Buendia <[hidden email]> wrote:
Hi,

I'm trying to access my stand alone spark app from spark-shell. I tried starting the shell by:

MASTER=local[2] ADD_JARS=/path/to/my/jar ./spark-shell

The log shows that the jar file was loaded. Also, I can access and create a new instance of mypackage.MyClass.

The problem is that myRDD.collect() returns RDD[MyClass], and that throws this exception:

java.lang.ClassNotFoundException: mypackage.MyClass
  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:264)
  at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:622)
  at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
  at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)
  at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1642)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
  at org.apache.spark.util.Utils$.deserialize(Utils.scala:59)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:702)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:698)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
  at org.apache.spark.scheduler.Task.run(Task.scala:53)
  at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)
  at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722)


Does this mean that my jar was not shipped to the workers? Is this a known issue, or am I doing something wrong here?


Reply | Threaded
Open this post in threaded view
|

Re: ADD_JARS doesn't properly work for spark-shell

Aaron Davidson
Additionally, which version of Spark are you running?


On Sat, Jan 4, 2014 at 6:27 PM, Aaron Davidson <[hidden email]> wrote:
I am not an expert on these classpath issues, but if you're using local mode, you might also try to set SPARK_CLASSPATH to include the path to the jar file as well. This should not really help, since "adding jars" is the right way to get the jars to your executors (which is where the exception appears to be happening), but it would sure be interesting if it did.


On Sat, Jan 4, 2014 at 4:50 PM, Aureliano Buendia <[hidden email]> wrote:
I should add that I can see in the log that the jar being shipped to the workers:

14/01/04 15:34:52 INFO Executor: Fetching http://192.168.1.111:51031/jars/my.jar.jar with timestamp 1388881979092
14/01/04 15:34:52 INFO Utils: Fetching http://192.168.1.111:51031/jars/my.jar.jar to /var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/fetchFileTemp8322008964976744710.tmp
14/01/04 15:34:53 INFO Executor: Adding file:/var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/spark-d8ac8f66-fad6-4b3f-8059-73f13b86b070/my.jar.jar to class loader


On Sun, Jan 5, 2014 at 12:46 AM, Aureliano Buendia <[hidden email]> wrote:
Hi,

I'm trying to access my stand alone spark app from spark-shell. I tried starting the shell by:

MASTER=local[2] ADD_JARS=/path/to/my/jar ./spark-shell

The log shows that the jar file was loaded. Also, I can access and create a new instance of mypackage.MyClass.

The problem is that myRDD.collect() returns RDD[MyClass], and that throws this exception:

java.lang.ClassNotFoundException: mypackage.MyClass
  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:264)
  at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:622)
  at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
  at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)
  at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1642)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
  at org.apache.spark.util.Utils$.deserialize(Utils.scala:59)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:702)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:698)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
  at org.apache.spark.scheduler.Task.run(Task.scala:53)
  at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)
  at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722)


Does this mean that my jar was not shipped to the workers? Is this a known issue, or am I doing something wrong here?



Reply | Threaded
Open this post in threaded view
|

Re: ADD_JARS doesn't properly work for spark-shell

Imran Rashid
actually, I think adding it to SPARK_CLASSPATH is exactly right.  The exception is not on the executors, but in the driver -- its happening when the driver tries to read results that the executor is sending back to it.

So the executors know about mypackage.MyClass, they happily run and send their data back to the driver, and then the driver tries to read those results and blows up, because it hasn't loaded the jar.

probably ADD_JARS should get auto-added to SPARK_CLASSPATH, but for now, I think it will work if you just list it in both


On Sat, Jan 4, 2014 at 8:28 PM, Aaron Davidson <[hidden email]> wrote:
Additionally, which version of Spark are you running?


On Sat, Jan 4, 2014 at 6:27 PM, Aaron Davidson <[hidden email]> wrote:
I am not an expert on these classpath issues, but if you're using local mode, you might also try to set SPARK_CLASSPATH to include the path to the jar file as well. This should not really help, since "adding jars" is the right way to get the jars to your executors (which is where the exception appears to be happening), but it would sure be interesting if it did.


On Sat, Jan 4, 2014 at 4:50 PM, Aureliano Buendia <[hidden email]> wrote:
I should add that I can see in the log that the jar being shipped to the workers:

14/01/04 15:34:52 INFO Executor: Fetching http://192.168.1.111:51031/jars/my.jar.jar with timestamp 1388881979092
14/01/04 15:34:52 INFO Utils: Fetching http://192.168.1.111:51031/jars/my.jar.jar to /var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/fetchFileTemp8322008964976744710.tmp
14/01/04 15:34:53 INFO Executor: Adding file:/var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/spark-d8ac8f66-fad6-4b3f-8059-73f13b86b070/my.jar.jar to class loader


On Sun, Jan 5, 2014 at 12:46 AM, Aureliano Buendia <[hidden email]> wrote:
Hi,

I'm trying to access my stand alone spark app from spark-shell. I tried starting the shell by:

MASTER=local[2] ADD_JARS=/path/to/my/jar ./spark-shell

The log shows that the jar file was loaded. Also, I can access and create a new instance of mypackage.MyClass.

The problem is that myRDD.collect() returns RDD[MyClass], and that throws this exception:

java.lang.ClassNotFoundException: mypackage.MyClass
  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:264)
  at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:622)
  at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
  at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)
  at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1642)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
  at org.apache.spark.util.Utils$.deserialize(Utils.scala:59)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:702)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:698)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
  at org.apache.spark.scheduler.Task.run(Task.scala:53)
  at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)
  at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722)


Does this mean that my jar was not shipped to the workers? Is this a known issue, or am I doing something wrong here?




Reply | Threaded
Open this post in threaded view
|

Re: ADD_JARS doesn't properly work for spark-shell

Aureliano Buendia
In reply to this post by Aaron Davidson



On Sun, Jan 5, 2014 at 2:28 AM, Aaron Davidson <[hidden email]> wrote:
Additionally, which version of Spark are you running?

0.8.1.

Unfortunately, this doesn't work either:

MASTER=local[2] ADD_JARS=/path/to/my/jar SPARK_CLASSPATH=/path/to/my/jar ./spark-shell
 


On Sat, Jan 4, 2014 at 6:27 PM, Aaron Davidson <[hidden email]> wrote:
I am not an expert on these classpath issues, but if you're using local mode, you might also try to set SPARK_CLASSPATH to include the path to the jar file as well. This should not really help, since "adding jars" is the right way to get the jars to your executors (which is where the exception appears to be happening), but it would sure be interesting if it did.


On Sat, Jan 4, 2014 at 4:50 PM, Aureliano Buendia <[hidden email]> wrote:
I should add that I can see in the log that the jar being shipped to the workers:

14/01/04 15:34:52 INFO Executor: Fetching http://192.168.1.111:51031/jars/my.jar.jar with timestamp 1388881979092
14/01/04 15:34:52 INFO Utils: Fetching http://192.168.1.111:51031/jars/my.jar.jar to /var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/fetchFileTemp8322008964976744710.tmp
14/01/04 15:34:53 INFO Executor: Adding file:/var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/spark-d8ac8f66-fad6-4b3f-8059-73f13b86b070/my.jar.jar to class loader


On Sun, Jan 5, 2014 at 12:46 AM, Aureliano Buendia <[hidden email]> wrote:
Hi,

I'm trying to access my stand alone spark app from spark-shell. I tried starting the shell by:

MASTER=local[2] ADD_JARS=/path/to/my/jar ./spark-shell

The log shows that the jar file was loaded. Also, I can access and create a new instance of mypackage.MyClass.

The problem is that myRDD.collect() returns RDD[MyClass], and that throws this exception:

java.lang.ClassNotFoundException: mypackage.MyClass
  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:264)
  at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:622)
  at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
  at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)
  at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1642)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
  at org.apache.spark.util.Utils$.deserialize(Utils.scala:59)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:702)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:698)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
  at org.apache.spark.scheduler.Task.run(Task.scala:53)
  at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)
  at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722)


Does this mean that my jar was not shipped to the workers? Is this a known issue, or am I doing something wrong here?




Reply | Threaded
Open this post in threaded view
|

Re: ADD_JARS doesn't properly work for spark-shell

Aaron Davidson
Cool. To confirm, you said you can access the class and construct new objects -- did you do this in the shell itself (i.e., on the driver), or on the executors?

Specifically, one of the following two should fail in the shell:
> new mypackage.MyClass()
> sc.parallelize(0 until 10, 2).foreach(_ => new mypackage.MyClass())
(or just import it)

You could also try running MASTER=local-cluster[2,1,512] which launches 2 executors, 1 core each, with 512MB in a setup that mimics a real cluster more closely, in case it's a bug only related to using local mode. 


On Sat, Jan 4, 2014 at 7:07 PM, Aureliano Buendia <[hidden email]> wrote:



On Sun, Jan 5, 2014 at 2:28 AM, Aaron Davidson <[hidden email]> wrote:
Additionally, which version of Spark are you running?

0.8.1.

Unfortunately, this doesn't work either:

MASTER=local[2] ADD_JARS=/path/to/my/jar SPARK_CLASSPATH=/path/to/my/jar ./spark-shell
 


On Sat, Jan 4, 2014 at 6:27 PM, Aaron Davidson <[hidden email]> wrote:
I am not an expert on these classpath issues, but if you're using local mode, you might also try to set SPARK_CLASSPATH to include the path to the jar file as well. This should not really help, since "adding jars" is the right way to get the jars to your executors (which is where the exception appears to be happening), but it would sure be interesting if it did.


On Sat, Jan 4, 2014 at 4:50 PM, Aureliano Buendia <[hidden email]> wrote:
I should add that I can see in the log that the jar being shipped to the workers:

14/01/04 15:34:52 INFO Executor: Fetching http://192.168.1.111:51031/jars/my.jar.jar with timestamp 1388881979092
14/01/04 15:34:52 INFO Utils: Fetching http://192.168.1.111:51031/jars/my.jar.jar to /var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/fetchFileTemp8322008964976744710.tmp
14/01/04 15:34:53 INFO Executor: Adding file:/var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/spark-d8ac8f66-fad6-4b3f-8059-73f13b86b070/my.jar.jar to class loader


On Sun, Jan 5, 2014 at 12:46 AM, Aureliano Buendia <[hidden email]> wrote:
Hi,

I'm trying to access my stand alone spark app from spark-shell. I tried starting the shell by:

MASTER=local[2] ADD_JARS=/path/to/my/jar ./spark-shell

The log shows that the jar file was loaded. Also, I can access and create a new instance of mypackage.MyClass.

The problem is that myRDD.collect() returns RDD[MyClass], and that throws this exception:

java.lang.ClassNotFoundException: mypackage.MyClass
  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:264)
  at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:622)
  at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
  at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)
  at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1642)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
  at org.apache.spark.util.Utils$.deserialize(Utils.scala:59)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:702)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:698)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
  at org.apache.spark.scheduler.Task.run(Task.scala:53)
  at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)
  at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722)


Does this mean that my jar was not shipped to the workers? Is this a known issue, or am I doing something wrong here?





Reply | Threaded
Open this post in threaded view
|

Re: ADD_JARS doesn't properly work for spark-shell

Aureliano Buendia
Sorry, I had a typo. I can conform that using ADD_JARS together with SPARK_CLASSPATH works as expected in spark-shell.

It'd make sense to have the two combined as one option.


On Sun, Jan 5, 2014 at 3:51 AM, Aaron Davidson <[hidden email]> wrote:
Cool. To confirm, you said you can access the class and construct new objects -- did you do this in the shell itself (i.e., on the driver), or on the executors?

Specifically, one of the following two should fail in the shell:
> new mypackage.MyClass()
> sc.parallelize(0 until 10, 2).foreach(_ => new mypackage.MyClass())
(or just import it)

You could also try running MASTER=local-cluster[2,1,512] which launches 2 executors, 1 core each, with 512MB in a setup that mimics a real cluster more closely, in case it's a bug only related to using local mode. 


On Sat, Jan 4, 2014 at 7:07 PM, Aureliano Buendia <[hidden email]> wrote:



On Sun, Jan 5, 2014 at 2:28 AM, Aaron Davidson <[hidden email]> wrote:
Additionally, which version of Spark are you running?

0.8.1.

Unfortunately, this doesn't work either:

MASTER=local[2] ADD_JARS=/path/to/my/jar SPARK_CLASSPATH=/path/to/my/jar ./spark-shell
 


On Sat, Jan 4, 2014 at 6:27 PM, Aaron Davidson <[hidden email]> wrote:
I am not an expert on these classpath issues, but if you're using local mode, you might also try to set SPARK_CLASSPATH to include the path to the jar file as well. This should not really help, since "adding jars" is the right way to get the jars to your executors (which is where the exception appears to be happening), but it would sure be interesting if it did.


On Sat, Jan 4, 2014 at 4:50 PM, Aureliano Buendia <[hidden email]> wrote:
I should add that I can see in the log that the jar being shipped to the workers:

14/01/04 15:34:52 INFO Executor: Fetching http://192.168.1.111:51031/jars/my.jar.jar with timestamp 1388881979092
14/01/04 15:34:52 INFO Utils: Fetching http://192.168.1.111:51031/jars/my.jar.jar to /var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/fetchFileTemp8322008964976744710.tmp
14/01/04 15:34:53 INFO Executor: Adding file:/var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/spark-d8ac8f66-fad6-4b3f-8059-73f13b86b070/my.jar.jar to class loader


On Sun, Jan 5, 2014 at 12:46 AM, Aureliano Buendia <[hidden email]> wrote:
Hi,

I'm trying to access my stand alone spark app from spark-shell. I tried starting the shell by:

MASTER=local[2] ADD_JARS=/path/to/my/jar ./spark-shell

The log shows that the jar file was loaded. Also, I can access and create a new instance of mypackage.MyClass.

The problem is that myRDD.collect() returns RDD[MyClass], and that throws this exception:

java.lang.ClassNotFoundException: mypackage.MyClass
  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:264)
  at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:622)
  at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
  at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)
  at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1642)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
  at org.apache.spark.util.Utils$.deserialize(Utils.scala:59)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:702)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:698)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
  at org.apache.spark.scheduler.Task.run(Task.scala:53)
  at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)
  at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722)


Does this mean that my jar was not shipped to the workers? Is this a known issue, or am I doing something wrong here?






Reply | Threaded
Open this post in threaded view
|

Re: ADD_JARS doesn't properly work for spark-shell

Aureliano Buendia
While myrdd.count() works, a lot of other actions and transformations do not still work in spark-shell. Eg myrdd.first() gives this error:

java.lang.ClassCastException: mypackage.MyClass cannot be cast to scala.runtime.Nothing$

Also, myrdd.map(r => r) returns:

org.apache.spark.rdd.RDD[Nothing] = MappedRDD[2]

Basically, type mypackage.MyClass gets converted to Nothing during any action/transformation.



On Sun, Jan 5, 2014 at 4:06 AM, Aureliano Buendia <[hidden email]> wrote:
Sorry, I had a typo. I can conform that using ADD_JARS together with SPARK_CLASSPATH works as expected in spark-shell.

It'd make sense to have the two combined as one option.


On Sun, Jan 5, 2014 at 3:51 AM, Aaron Davidson <[hidden email]> wrote:
Cool. To confirm, you said you can access the class and construct new objects -- did you do this in the shell itself (i.e., on the driver), or on the executors?

Specifically, one of the following two should fail in the shell:
> new mypackage.MyClass()
> sc.parallelize(0 until 10, 2).foreach(_ => new mypackage.MyClass())
(or just import it)

You could also try running MASTER=local-cluster[2,1,512] which launches 2 executors, 1 core each, with 512MB in a setup that mimics a real cluster more closely, in case it's a bug only related to using local mode. 


On Sat, Jan 4, 2014 at 7:07 PM, Aureliano Buendia <[hidden email]> wrote:



On Sun, Jan 5, 2014 at 2:28 AM, Aaron Davidson <[hidden email]> wrote:
Additionally, which version of Spark are you running?

0.8.1.

Unfortunately, this doesn't work either:

MASTER=local[2] ADD_JARS=/path/to/my/jar SPARK_CLASSPATH=/path/to/my/jar ./spark-shell
 


On Sat, Jan 4, 2014 at 6:27 PM, Aaron Davidson <[hidden email]> wrote:
I am not an expert on these classpath issues, but if you're using local mode, you might also try to set SPARK_CLASSPATH to include the path to the jar file as well. This should not really help, since "adding jars" is the right way to get the jars to your executors (which is where the exception appears to be happening), but it would sure be interesting if it did.


On Sat, Jan 4, 2014 at 4:50 PM, Aureliano Buendia <[hidden email]> wrote:
I should add that I can see in the log that the jar being shipped to the workers:

14/01/04 15:34:52 INFO Executor: Fetching http://192.168.1.111:51031/jars/my.jar.jar with timestamp 1388881979092
14/01/04 15:34:52 INFO Utils: Fetching http://192.168.1.111:51031/jars/my.jar.jar to /var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/fetchFileTemp8322008964976744710.tmp
14/01/04 15:34:53 INFO Executor: Adding file:/var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/spark-d8ac8f66-fad6-4b3f-8059-73f13b86b070/my.jar.jar to class loader


On Sun, Jan 5, 2014 at 12:46 AM, Aureliano Buendia <[hidden email]> wrote:
Hi,

I'm trying to access my stand alone spark app from spark-shell. I tried starting the shell by:

MASTER=local[2] ADD_JARS=/path/to/my/jar ./spark-shell

The log shows that the jar file was loaded. Also, I can access and create a new instance of mypackage.MyClass.

The problem is that myRDD.collect() returns RDD[MyClass], and that throws this exception:

java.lang.ClassNotFoundException: mypackage.MyClass
  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:264)
  at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:622)
  at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
  at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)
  at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1642)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
  at org.apache.spark.util.Utils$.deserialize(Utils.scala:59)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:702)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:698)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
  at org.apache.spark.scheduler.Task.run(Task.scala:53)
  at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)
  at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722)


Does this mean that my jar was not shipped to the workers? Is this a known issue, or am I doing something wrong here?







Reply | Threaded
Open this post in threaded view
|

Re: ADD_JARS doesn't properly work for spark-shell

Aaron Davidson
That sounds like a different issue. What is the type of myrdd (i.e., if you just type myrdd into the shell)? It's possible it's defined as an RDD[Nothing] and thus all operations try to typecast to Nothing, which always fails. Perhaps declaring it initially with respect to your class would help, something like
val myrdd: RDD[mypackage.MyClass] = sc.sequenceFile(...)


On Sat, Jan 4, 2014 at 8:29 PM, Aureliano Buendia <[hidden email]> wrote:
While myrdd.count() works, a lot of other actions and transformations do not still work in spark-shell. Eg myrdd.first() gives this error:

java.lang.ClassCastException: mypackage.MyClass cannot be cast to scala.runtime.Nothing$

Also, myrdd.map(r => r) returns:

org.apache.spark.rdd.RDD[Nothing] = MappedRDD[2]

Basically, type mypackage.MyClass gets converted to Nothing during any action/transformation.



On Sun, Jan 5, 2014 at 4:06 AM, Aureliano Buendia <[hidden email]> wrote:
Sorry, I had a typo. I can conform that using ADD_JARS together with SPARK_CLASSPATH works as expected in spark-shell.

It'd make sense to have the two combined as one option.


On Sun, Jan 5, 2014 at 3:51 AM, Aaron Davidson <[hidden email]> wrote:
Cool. To confirm, you said you can access the class and construct new objects -- did you do this in the shell itself (i.e., on the driver), or on the executors?

Specifically, one of the following two should fail in the shell:
> new mypackage.MyClass()
> sc.parallelize(0 until 10, 2).foreach(_ => new mypackage.MyClass())
(or just import it)

You could also try running MASTER=local-cluster[2,1,512] which launches 2 executors, 1 core each, with 512MB in a setup that mimics a real cluster more closely, in case it's a bug only related to using local mode. 


On Sat, Jan 4, 2014 at 7:07 PM, Aureliano Buendia <[hidden email]> wrote:



On Sun, Jan 5, 2014 at 2:28 AM, Aaron Davidson <[hidden email]> wrote:
Additionally, which version of Spark are you running?

0.8.1.

Unfortunately, this doesn't work either:

MASTER=local[2] ADD_JARS=/path/to/my/jar SPARK_CLASSPATH=/path/to/my/jar ./spark-shell
 


On Sat, Jan 4, 2014 at 6:27 PM, Aaron Davidson <[hidden email]> wrote:
I am not an expert on these classpath issues, but if you're using local mode, you might also try to set SPARK_CLASSPATH to include the path to the jar file as well. This should not really help, since "adding jars" is the right way to get the jars to your executors (which is where the exception appears to be happening), but it would sure be interesting if it did.


On Sat, Jan 4, 2014 at 4:50 PM, Aureliano Buendia <[hidden email]> wrote:
I should add that I can see in the log that the jar being shipped to the workers:

14/01/04 15:34:52 INFO Executor: Fetching http://192.168.1.111:51031/jars/my.jar.jar with timestamp 1388881979092
14/01/04 15:34:52 INFO Utils: Fetching http://192.168.1.111:51031/jars/my.jar.jar to /var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/fetchFileTemp8322008964976744710.tmp
14/01/04 15:34:53 INFO Executor: Adding file:/var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/spark-d8ac8f66-fad6-4b3f-8059-73f13b86b070/my.jar.jar to class loader


On Sun, Jan 5, 2014 at 12:46 AM, Aureliano Buendia <[hidden email]> wrote:
Hi,

I'm trying to access my stand alone spark app from spark-shell. I tried starting the shell by:

MASTER=local[2] ADD_JARS=/path/to/my/jar ./spark-shell

The log shows that the jar file was loaded. Also, I can access and create a new instance of mypackage.MyClass.

The problem is that myRDD.collect() returns RDD[MyClass], and that throws this exception:

java.lang.ClassNotFoundException: mypackage.MyClass
  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:264)
  at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:622)
  at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
  at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)
  at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1642)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
  at org.apache.spark.util.Utils$.deserialize(Utils.scala:59)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:702)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:698)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
  at org.apache.spark.scheduler.Task.run(Task.scala:53)
  at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)
  at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722)


Does this mean that my jar was not shipped to the workers? Is this a known issue, or am I doing something wrong here?








Reply | Threaded
Open this post in threaded view
|

Re: ADD_JARS doesn't properly work for spark-shell

Aureliano Buendia



On Sun, Jan 5, 2014 at 6:01 AM, Aaron Davidson <[hidden email]> wrote:
That sounds like a different issue. What is the type of myrdd (i.e., if you just type myrdd into the shell)? It's possible it's defined as an RDD[Nothing] and thus all operations try to typecast to Nothing, which always fails. Perhaps declaring it initially with respect to your class would help, something like
val myrdd: RDD[mypackage.MyClass] = sc.sequenceFile(...)

This solved the problem, thanks!

Is it because sc.objectFile() returns RDD{Nothing], or is it a spark-shell problem?
 


On Sat, Jan 4, 2014 at 8:29 PM, Aureliano Buendia <[hidden email]> wrote:
While myrdd.count() works, a lot of other actions and transformations do not still work in spark-shell. Eg myrdd.first() gives this error:

java.lang.ClassCastException: mypackage.MyClass cannot be cast to scala.runtime.Nothing$

Also, myrdd.map(r => r) returns:

org.apache.spark.rdd.RDD[Nothing] = MappedRDD[2]

Basically, type mypackage.MyClass gets converted to Nothing during any action/transformation.



On Sun, Jan 5, 2014 at 4:06 AM, Aureliano Buendia <[hidden email]> wrote:
Sorry, I had a typo. I can conform that using ADD_JARS together with SPARK_CLASSPATH works as expected in spark-shell.

It'd make sense to have the two combined as one option.


On Sun, Jan 5, 2014 at 3:51 AM, Aaron Davidson <[hidden email]> wrote:
Cool. To confirm, you said you can access the class and construct new objects -- did you do this in the shell itself (i.e., on the driver), or on the executors?

Specifically, one of the following two should fail in the shell:
> new mypackage.MyClass()
> sc.parallelize(0 until 10, 2).foreach(_ => new mypackage.MyClass())
(or just import it)

You could also try running MASTER=local-cluster[2,1,512] which launches 2 executors, 1 core each, with 512MB in a setup that mimics a real cluster more closely, in case it's a bug only related to using local mode. 


On Sat, Jan 4, 2014 at 7:07 PM, Aureliano Buendia <[hidden email]> wrote:



On Sun, Jan 5, 2014 at 2:28 AM, Aaron Davidson <[hidden email]> wrote:
Additionally, which version of Spark are you running?

0.8.1.

Unfortunately, this doesn't work either:

MASTER=local[2] ADD_JARS=/path/to/my/jar SPARK_CLASSPATH=/path/to/my/jar ./spark-shell
 


On Sat, Jan 4, 2014 at 6:27 PM, Aaron Davidson <[hidden email]> wrote:
I am not an expert on these classpath issues, but if you're using local mode, you might also try to set SPARK_CLASSPATH to include the path to the jar file as well. This should not really help, since "adding jars" is the right way to get the jars to your executors (which is where the exception appears to be happening), but it would sure be interesting if it did.


On Sat, Jan 4, 2014 at 4:50 PM, Aureliano Buendia <[hidden email]> wrote:
I should add that I can see in the log that the jar being shipped to the workers:

14/01/04 15:34:52 INFO Executor: Fetching http://192.168.1.111:51031/jars/my.jar.jar with timestamp 1388881979092
14/01/04 15:34:52 INFO Utils: Fetching http://192.168.1.111:51031/jars/my.jar.jar to /var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/fetchFileTemp8322008964976744710.tmp
14/01/04 15:34:53 INFO Executor: Adding file:/var/folders/3g/jyx81ctj3698wbvphxhm4dw40000gn/T/spark-d8ac8f66-fad6-4b3f-8059-73f13b86b070/my.jar.jar to class loader


On Sun, Jan 5, 2014 at 12:46 AM, Aureliano Buendia <[hidden email]> wrote:
Hi,

I'm trying to access my stand alone spark app from spark-shell. I tried starting the shell by:

MASTER=local[2] ADD_JARS=/path/to/my/jar ./spark-shell

The log shows that the jar file was loaded. Also, I can access and create a new instance of mypackage.MyClass.

The problem is that myRDD.collect() returns RDD[MyClass], and that throws this exception:

java.lang.ClassNotFoundException: mypackage.MyClass
  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:264)
  at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:622)
  at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
  at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)
  at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1642)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
  at org.apache.spark.util.Utils$.deserialize(Utils.scala:59)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:573)
  at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:702)
  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:698)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:872)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
  at org.apache.spark.scheduler.Task.run(Task.scala:53)
  at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:215)
  at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722)


Does this mean that my jar was not shipped to the workers? Is this a known issue, or am I doing something wrong here?