Re: OptionalDataException in spark

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: OptionalDataException in spark

Ravi Aggarwal

Hi,

 

We are encountering java OptionalDataException in one of our spark jobs.

All the tasks of a stage passes (or atleast we do not see any error), but stage fails with above exception while getting task result.

And and this exception gets printed on driver.

 

Any pointers in this regard would be helpful.

 

Here is the stack-trace:

 

java.io.OptionalDataException
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1555)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
    at scala.collection.mutable.HashMap$$anonfun$readObject$1.apply(HashMap.scala:174)
    at scala.collection.mutable.HashMap$$anonfun$readObject$1.apply(HashMap.scala:174)
    at scala.collection.mutable.HashTable$class.init(HashTable.scala:109)
    at scala.collection.mutable.HashMap.init(HashMap.scala:40)
    at scala.collection.mutable.HashMap.readObject(HashMap.scala:174)
    at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2136)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
    at org.apache.spark.scheduler.DirectTaskResult$$anonfun$readExternal$1$$anonfun$apply$mcV$sp$2.apply(TaskResult.scala:67)
    at org.apache.spark.scheduler.DirectTaskResult$$anonfun$readExternal$1$$anonfun$apply$mcV$sp$2.apply(TaskResult.scala:66)
    at scala.collection.immutable.Range.foreach(Range.scala:160)
    at org.apache.spark.scheduler.DirectTaskResult$$anonfun$readExternal$1.apply$mcV$sp(TaskResult.scala:66)
    at org.apache.spark.scheduler.DirectTaskResult$$anonfun$readExternal$1.apply(TaskResult.scala:55)
    at org.apache.spark.scheduler.DirectTaskResult$$anonfun$readExternal$1.apply(TaskResult.scala:55)
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
    at org.apache.spark.scheduler.DirectTaskResult.readExternal(TaskResult.scala:55)
    at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:2076)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2025)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:108)
    at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:64)
    at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
    at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
    at org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:62)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

  

Thanks

 

Ravi Aggarwal

408.536.6719 (tel)

E11-355

Computer Scientist

669.214.1491 (cell)

San Jose, CA, 95110, US

Adobe. Make It an Experience.

[hidden email]

Adobe.com

 

 

 

Any additional, business necessary information, such as legal requirements for your region can go here – Arial, size 8, italicized.  Otherwise, delete this text.

Reply | Threaded
Open this post in threaded view
|

Re: OptionalDataException in spark

Phillip Henry
Strongly suspect you're mutating an object at the point in time it is Serialized.

I suggest you remove all mutation from your code.

HTH.

Phillip


On Fri, Dec 13, 2019 at 7:19 PM Ravi Aggarwal <[hidden email]> wrote:

Hi,

 

We are encountering java OptionalDataException in one of our spark jobs.

All the tasks of a stage passes (or atleast we do not see any error), but stage fails with above exception while getting task result.

And and this exception gets printed on driver.

 

Any pointers in this regard would be helpful.

 

Here is the stack-trace:

 

java.io.OptionalDataException
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1555)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
    at scala.collection.mutable.HashMap$$anonfun$readObject$1.apply(HashMap.scala:174)
    at scala.collection.mutable.HashMap$$anonfun$readObject$1.apply(HashMap.scala:174)
    at scala.collection.mutable.HashTable$class.init(HashTable.scala:109)
    at scala.collection.mutable.HashMap.init(HashMap.scala:40)
    at scala.collection.mutable.HashMap.readObject(HashMap.scala:174)
    at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2136)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
    at org.apache.spark.scheduler.DirectTaskResult$$anonfun$readExternal$1$$anonfun$apply$mcV$sp$2.apply(TaskResult.scala:67)
    at org.apache.spark.scheduler.DirectTaskResult$$anonfun$readExternal$1$$anonfun$apply$mcV$sp$2.apply(TaskResult.scala:66)
    at scala.collection.immutable.Range.foreach(Range.scala:160)
    at org.apache.spark.scheduler.DirectTaskResult$$anonfun$readExternal$1.apply$mcV$sp(TaskResult.scala:66)
    at org.apache.spark.scheduler.DirectTaskResult$$anonfun$readExternal$1.apply(TaskResult.scala:55)
    at org.apache.spark.scheduler.DirectTaskResult$$anonfun$readExternal$1.apply(TaskResult.scala:55)
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
    at org.apache.spark.scheduler.DirectTaskResult.readExternal(TaskResult.scala:55)
    at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:2076)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2025)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:108)
    at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:64)
    at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
    at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
    at org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:62)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

  

Thanks

 

Image removed by sender.

Ravi Aggarwal

408.536.6719 (tel)

E11-355

Computer Scientist

669.214.1491 (cell)

San Jose, CA, 95110, US

Adobe. Make It an Experience.

[hidden email]

Adobe.com

 

 

 

Any additional, business necessary information, such as legal requirements for your region can go here – Arial, size 8, italicized.  Otherwise, delete this text.