object file not loading correctly

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

object file not loading correctly

zhen
I am having trouble loading object files correctly.
For example:

val test = sc.parallelize(List(1, 2, 3))
test.saveAsObjectFile("seqFile")
val loadedFile = sc.objectFile("seqFile"):
loadedFile: org.apache.spark.rdd.RDD[Nothing] = FlatMappedRDD[4] at
objectFile at <console>:12

Then if I do the following:

loadedFile.first

The output is:
java.lang.ClassCastException: java.lang.Integer cannot be cast to
scala.runtime.Nothing$
        at <init>(<console>:15)
        at <init>(<console>:20)
        at <init>(<console>:22)
        at <init>(<console>:24)
        at <init>(<console>:26)
        at .<init>(<console>:30)
        at .<clinit>(<console>)
        at .<init>(<console>:11)
        at .<clinit>(<console>)
        at $export(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:629)
        at
org.apache.spark.repl.SparkIMain$Request$$anonfun$10.apply(SparkIMain.scala:897)
        at scala.tools.nsc.interpreter.Line$$anonfun$1.apply$mcV$sp(Line.scala:43)
        at scala.tools.nsc.io.package$$anon$2.run(package.scala:25)
        at java.lang.Thread.run(Thread.java:662)

It seems spark is not recognising the loaded objects as an array of
integers.
I am running spark 0.8.1 on the latest cloudera quick start VM
Reply | Threaded
Open this post in threaded view
|

Re: object file not loading correctly

Imran Rashid
The problem is that at compile time, sc.objectFile() has no idea what type of objects its loading.  Note the type of loadedFile:

loadedFile: org.apache.spark.rdd.RDD[Nothing]

that "Nothing" basically means the scala compiler has no idea what the type of objects in the RDD are.

So, then when you call first, at runtime the jvm sees that the RDD actually contains java.lang.Integer, and throws an exception because you can't cast java.lang.Integer to "Nothing".

The solution to this is to pass in a type parameter to sc.objectFile:

val loadedFile = sc.objectFile[Int](...)


I wonder if spark should make the compiler prevent Nothing, using something like this:
http://blog.evilmonkeylabs.com/2012/05/31/Forcing_Compiler_Nothing_checks/

unfortunately those compiler error msgs might be just as confusing as the class cast exception, so I dunno if it would help prevent any issues ...



On Sat, Feb 1, 2014 at 3:46 PM, zhen <[hidden email]> wrote:
I am having trouble loading object files correctly.
For example:

val test = sc.parallelize(List(1, 2, 3))
test.saveAsObjectFile("seqFile")
val loadedFile = sc.objectFile("seqFile"):
loadedFile: org.apache.spark.rdd.RDD[Nothing] = FlatMappedRDD[4] at
objectFile at <console>:12

Then if I do the following:

loadedFile.first

The output is:
java.lang.ClassCastException: java.lang.Integer cannot be cast to
scala.runtime.Nothing$
        at <init>(<console>:15)
        at <init>(<console>:20)
        at <init>(<console>:22)
        at <init>(<console>:24)
        at <init>(<console>:26)
        at .<init>(<console>:30)
        at .<clinit>(<console>)
        at .<init>(<console>:11)
        at .<clinit>(<console>)
        at $export(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:629)
        at
org.apache.spark.repl.SparkIMain$Request$$anonfun$10.apply(SparkIMain.scala:897)
        at scala.tools.nsc.interpreter.Line$$anonfun$1.apply$mcV$sp(Line.scala:43)
        at scala.tools.nsc.io.package$$anon$2.run(package.scala:25)
        at java.lang.Thread.run(Thread.java:662)

It seems spark is not recognising the loaded objects as an array of
integers.
I am running spark 0.8.1 on the latest cloudera quick start VM



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/object-file-not-loading-correctly-tp1107.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: object file not loading correctly

zhen
Thank you so much Imran. Your solution works perfectly. Also thank you for explaining the reason for it not working.