PicklingError - Can't pickle py4j.protocol.Py4JJavaError - it's not the same object

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

PicklingError - Can't pickle py4j.protocol.Py4JJavaError - it's not the same object

AbdealiJK
I am using spark + celery to run some spark scripts async from the rest of my code.
When any of my celery tasks get an error and throw a python Exception, the celery on_error() is called and I can handle exceptions easily by logging the exception.

Seems like the only exception that fails to work is Py4JJavaErrors thrown by spark.
When my code generates a py4jJavaError, i get an exception in the error handling of celery. It says the error could not be unpickled right because they are two different entities.

I'm looking for clues as to what could cause it. I am able to import py4j.protocol.Py4jJavaError and do debugs.
I went into pyspark/sql/utils.py:capture_sql_exception() which is where my Py4jJavaError is being thrown and found:
py4j.__file__ = /usr/local/hadoop/spark2.3.1/python/lib/py4j-0.10.7-src.zip/py4j/__init__.py
id(py4j.protocol.Py4JJavaError) = 140436967669656

I also went to where the unpickling exception was occurring inside billiard codebase and found:
py4j.__file__ = /usr/local/hadoop/spark2.3.1/python/lib/py4j-0.10.7-src.zip/py4j/__init__.py
id(py4j.protocol.Py4JJavaError) =140436967669656

I'm confused as to why an error like this can come up if the id() from python for both these types are the exact same. and also the file that is loading them is the same. 
I was originally under the impression that there were multiple versions of py4j conflicting with each other but that does not seem to be the case.

Any thoughts on this would be helpful! Thanks

---

Here is the exact error I get during the exception handling:

2018-12-02 18:11:41,403: ERROR/MainProcess] Task handler raised error: <MaybeEncodingError: Error sending result: '"(1, <ExceptionInfo: Py4JJavaError('An error occurred while calling o1000.showString.\\n', 'JavaObject id=o1001')>, None)"'. Reason: ''PicklingError("Can\'t pickle <class \'py4j.protocol.Py4JJavaError\'>: it\'s not the same object as py4j.protocol.Py4JJavaError",)''.>
Traceback (most recent call last):
  File "venv/lib/python3.6/site-packages/billiard/pool.py", line 363, in workloop
    put((READY, (job, i, result, inqW_fd)))
  File "venv/lib/python3.6/site-packages/billiard/queues.py", line 366, in put
    self.send_payload(ForkingPickler.dumps(obj))
  File "venv/lib/python3.6/site-packages/billiard/reduction.py", line 61, in dumps
    cls(buf, protocol).dump(obj)
billiard.pool.MaybeEncodingError: Error sending result: '"(1, <ExceptionInfo: Py4JJavaError('An error occurred while calling o1000.showString.\\n', 'JavaObject id=o1001')>, None)"'. Reason: ''PicklingError("Can\'t pickle <class \'py4j.protocol.Py4JJavaError\'>: it\'s not the same object as py4j.protocol.Py4JJavaError",)''.