Better way to debug serializable issues

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Better way to debug serializable issues

Ruijing Li
Hi all,

When working with spark jobs, I sometimes have to tackle with serialization issues, and I have a difficult time trying to fix those. A lot of times, the serialization issues happen only in cluster mode across the network in a mesos container, so I can’t debug locally. And the exception thrown by spark is not very helpful to find the cause. 

I’d love to hear some tips on how to debug in the right places. Also, I’d be interested to know if in future releases it would be possible to point out which class or function is causing the serialization issue (right now I find its either Java generic classes or the class Spark is running itself). Thanks!
--
Cheers,
Ruijing Li
Reply | Threaded
Open this post in threaded view
|

Re: Better way to debug serializable issues

Maxim Gekk
Hi Ruijing,

Spark uses SerializationDebugger (https://spark.apache.org/docs/latest/api/java/org/apache/spark/serializer/SerializationDebugger.html) as default debugger to detect the serialization issues. You can take more detailed serialization exception information by setting the following while creating a cluster:
spark.driver.extraJavaOptions -Dsun.io.serialization.extendedDebugInfo=true
spark.executor.extraJavaOptions -Dsun.io.serialization.extendedDebugInfo=true


Maxim Gekk

Software Engineer

Databricks, Inc.



On Tue, Feb 18, 2020 at 1:02 PM Ruijing Li <[hidden email]> wrote:
Hi all,

When working with spark jobs, I sometimes have to tackle with serialization issues, and I have a difficult time trying to fix those. A lot of times, the serialization issues happen only in cluster mode across the network in a mesos container, so I can’t debug locally. And the exception thrown by spark is not very helpful to find the cause. 

I’d love to hear some tips on how to debug in the right places. Also, I’d be interested to know if in future releases it would be possible to point out which class or function is causing the serialization issue (right now I find its either Java generic classes or the class Spark is running itself). Thanks!
--
Cheers,
Ruijing Li
Reply | Threaded
Open this post in threaded view
|

Re: Better way to debug serializable issues

Ruijing Li
Thanks all for the answer. Unfortunately while I wasn’t able to use the extra parameters to get the needed information, I did solve my issue. It was an issue of using pureconfig to read a certain config from hadoop before the spark session initialized, therefore pureconfig would error out in deserializing the class before spark could configure properly.


On Tue, Feb 18, 2020 at 10:24 AM Maxim Gekk <[hidden email]> wrote:
Hi Ruijing,

Spark uses SerializationDebugger (https://spark.apache.org/docs/latest/api/java/org/apache/spark/serializer/SerializationDebugger.html) as default debugger to detect the serialization issues. You can take more detailed serialization exception information by setting the following while creating a cluster:
spark.driver.extraJavaOptions -Dsun.io.serialization.extendedDebugInfo=true
spark.executor.extraJavaOptions -Dsun.io.serialization.extendedDebugInfo=true


Maxim Gekk

Software Engineer

Databricks, Inc.



On Tue, Feb 18, 2020 at 1:02 PM Ruijing Li <[hidden email]> wrote:
Hi all,

When working with spark jobs, I sometimes have to tackle with serialization issues, and I have a difficult time trying to fix those. A lot of times, the serialization issues happen only in cluster mode across the network in a mesos container, so I can’t debug locally. And the exception thrown by spark is not very helpful to find the cause. 

I’d love to hear some tips on how to debug in the right places. Also, I’d be interested to know if in future releases it would be possible to point out which class or function is causing the serialization issue (right now I find its either Java generic classes or the class Spark is running itself). Thanks!
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li