Re: Handle BlockMissingException in pyspark

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Re: Handle BlockMissingException in pyspark

John Zhuge-2
BlockMissingException typically indicates the HDFS file is corrupted. Might be an HDFS issue, Hadoop mailing list is a better bet: [hidden email].

Capture at the full stack trace in executor log.
If the file still exists, run `hdfs fsck -blockId blk_1233169822_159765693` to determine whether it is corrupted.
If not corrupted, could there be excessive (thousands) current reads on the block?
Hadoop version? Spark version?

On Mon, Aug 6, 2018 at 2:21 AM Divay Jindal <[hidden email]> wrote:
Hi ,

I am running pyspark in dockerized jupyter environment , I am constantly getting this error :

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 33 in stage 25.0 failed 1 times, most recent failure: Lost task 33.0 in stage 25.0 (TID 35067, localhost, executor driver)
: org.apache.hadoop.hdfs.BlockMissingException
: Could not obtain block: BP-1742911633-

Please can anyone help me with how to handle such exception in pyspark.

Best Regards
Divay Jindal