EOFException when deserializing (simple) task

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

EOFException when deserializing (simple) task

Sandy Ryza
Hi all,

I'm running into an EOFException when I try to run a simple spark job that reads a text file and collects the results.  It looks like it's occurring when the executor tries to deserialize the task.  The setup is Spark 0.9 against CDH5.

The error occurs with both python and scala.  Maybe interestingly, it doesn't show up when I run sc.parallelize(Array(1,2,3,4)).collect().  Which could mean it's running into trouble deserializing the HadoopRDD?

Any idea what could be going on?

thanks for any help,
Sandy

---

java.io.EOFException
at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2742)
at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1030)
at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68)
at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106)
at org.apache.hadoop.io.UTF8.readChars(UTF8.java:260)
at org.apache.hadoop.io.UTF8.readString(UTF8.java:252)
at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77)
at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:145)
at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1835)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1794)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:195)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:46)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:45)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

Reply | Threaded
Open this post in threaded view
|

Re: EOFException when deserializing (simple) task

Patrick Wendell
Do you happen to know if this still occurs when using the Hadoop
bindings for other versions of CDH or vanilla Hadoop? The error here
seems to be inside of Hadoop's own deserializer so it could be version
dependent.

Does this happen deterministiclly?

- Patrick

On Sat, Feb 1, 2014 at 7:15 PM, Sandy Ryza <[hidden email]> wrote:

> Hi all,
>
> I'm running into an EOFException when I try to run a simple spark job that
> reads a text file and collects the results.  It looks like it's occurring
> when the executor tries to deserialize the task.  The setup is Spark 0.9
> against CDH5.
>
> The error occurs with both python and scala.  Maybe interestingly, it
> doesn't show up when I run sc.parallelize(Array(1,2,3,4)).collect().  Which
> could mean it's running into trouble deserializing the HadoopRDD?
>
> Any idea what could be going on?
>
> thanks for any help,
> Sandy
>
> ---
>
> java.io.EOFException
> at
> java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2742)
> at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1030)
> at
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68)
> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106)
> at org.apache.hadoop.io.UTF8.readChars(UTF8.java:260)
> at org.apache.hadoop.io.UTF8.readString(UTF8.java:252)
> at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
> at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
> at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77)
> at
> org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:145)
> at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1835)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1794)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
> at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
> at
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:195)
> at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:46)
> at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:45)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at
> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
>