Re: Job hangs in blocked task in final parquet write stage

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Job hangs in blocked task in final parquet write stage

conradlee
Dear spark community,

I'm running spark 2.3.2 on EMR 5.19.0.  I've got a job that's hanging in the final stage--the job usually works, but I see this hanging behavior in about one out of 50 runs.

The second-to-last stage sorts the dataframe, and the final stage writes the dataframe to HDFS.

Here you can see the executor logs, which indicate that it has finished processing the task.  

Here you can see the thread dump from the executor that's hanging.  Here's the text of the blocked thread.

I tried to work around this problem by enabling speculation, but speculative execution never takes place.  I don't know why.

Can anyone here help me?

Thanks,
Conrad
Reply | Threaded
Open this post in threaded view
|

Re: Job hangs in blocked task in final parquet write stage

Vadim Semenov-2
Hey Conrad,

has it started happening recently?

We recently started having some sporadic problems with drivers on EMR
when it gets stuck, up until two weeks ago everything was fine.
We're trying to figure out with the EMR team where the issue is coming from.
On Tue, Nov 27, 2018 at 6:29 AM Conrad Lee <[hidden email]> wrote:

>
> Dear spark community,
>
> I'm running spark 2.3.2 on EMR 5.19.0.  I've got a job that's hanging in the final stage--the job usually works, but I see this hanging behavior in about one out of 50 runs.
>
> The second-to-last stage sorts the dataframe, and the final stage writes the dataframe to HDFS.
>
> Here you can see the executor logs, which indicate that it has finished processing the task.
>
> Here you can see the thread dump from the executor that's hanging.  Here's the text of the blocked thread.
>
> I tried to work around this problem by enabling speculation, but speculative execution never takes place.  I don't know why.
>
> Can anyone here help me?
>
> Thanks,
> Conrad



--
Sent from my iPhone

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Job hangs in blocked task in final parquet write stage

conradlee
Hello Vadim,

Interesting.  I've only been running this job at scale for a couple weeks so I can't say whether this is related to recent EMR changes.

Much of the EMR-specific code for spark has to do with writing files to s3.  In this case I'm writing files to the cluster's HDFS though so my sense is that this is a spark issue, not an EMR (but I'm not sure).

Conrad

On Tue, Nov 27, 2018 at 5:21 PM Vadim Semenov <[hidden email]> wrote:
Hey Conrad,

has it started happening recently?

We recently started having some sporadic problems with drivers on EMR
when it gets stuck, up until two weeks ago everything was fine.
We're trying to figure out with the EMR team where the issue is coming from.
On Tue, Nov 27, 2018 at 6:29 AM Conrad Lee <[hidden email]> wrote:
>
> Dear spark community,
>
> I'm running spark 2.3.2 on EMR 5.19.0.  I've got a job that's hanging in the final stage--the job usually works, but I see this hanging behavior in about one out of 50 runs.
>
> The second-to-last stage sorts the dataframe, and the final stage writes the dataframe to HDFS.
>
> Here you can see the executor logs, which indicate that it has finished processing the task.
>
> Here you can see the thread dump from the executor that's hanging.  Here's the text of the blocked thread.
>
> I tried to work around this problem by enabling speculation, but speculative execution never takes place.  I don't know why.
>
> Can anyone here help me?
>
> Thanks,
> Conrad



--
Sent from my iPhone
Reply | Threaded
Open this post in threaded view
|

Re: Job hangs in blocked task in final parquet write stage

Christopher Petrino
I ran into problems using 5.19 so I referred to 5.17 and it resolved my issues. 

On Wed, Nov 28, 2018 at 2:48 AM Conrad Lee <[hidden email]> wrote:
Hello Vadim,

Interesting.  I've only been running this job at scale for a couple weeks so I can't say whether this is related to recent EMR changes.

Much of the EMR-specific code for spark has to do with writing files to s3.  In this case I'm writing files to the cluster's HDFS though so my sense is that this is a spark issue, not an EMR (but I'm not sure).

Conrad

On Tue, Nov 27, 2018 at 5:21 PM Vadim Semenov <[hidden email]> wrote:
Hey Conrad,

has it started happening recently?

We recently started having some sporadic problems with drivers on EMR
when it gets stuck, up until two weeks ago everything was fine.
We're trying to figure out with the EMR team where the issue is coming from.
On Tue, Nov 27, 2018 at 6:29 AM Conrad Lee <[hidden email]> wrote:
>
> Dear spark community,
>
> I'm running spark 2.3.2 on EMR 5.19.0.  I've got a job that's hanging in the final stage--the job usually works, but I see this hanging behavior in about one out of 50 runs.
>
> The second-to-last stage sorts the dataframe, and the final stage writes the dataframe to HDFS.
>
> Here you can see the executor logs, which indicate that it has finished processing the task.
>
> Here you can see the thread dump from the executor that's hanging.  Here's the text of the blocked thread.
>
> I tried to work around this problem by enabling speculation, but speculative execution never takes place.  I don't know why.
>
> Can anyone here help me?
>
> Thanks,
> Conrad



--
Sent from my iPhone
Reply | Threaded
Open this post in threaded view
|

Re: Job hangs in blocked task in final parquet write stage

conradlee
Thanks, I'll try using 5.17.0.

For anyone trying to debug this problem in the future: In other jobs that hang in the same manner, the thread dump didn't have any blocked threads, so that might be a red herring.

On Wed, Nov 28, 2018 at 4:34 PM Christopher Petrino <[hidden email]> wrote:
I ran into problems using 5.19 so I referred to 5.17 and it resolved my issues. 

On Wed, Nov 28, 2018 at 2:48 AM Conrad Lee <[hidden email]> wrote:
Hello Vadim,

Interesting.  I've only been running this job at scale for a couple weeks so I can't say whether this is related to recent EMR changes.

Much of the EMR-specific code for spark has to do with writing files to s3.  In this case I'm writing files to the cluster's HDFS though so my sense is that this is a spark issue, not an EMR (but I'm not sure).

Conrad

On Tue, Nov 27, 2018 at 5:21 PM Vadim Semenov <[hidden email]> wrote:
Hey Conrad,

has it started happening recently?

We recently started having some sporadic problems with drivers on EMR
when it gets stuck, up until two weeks ago everything was fine.
We're trying to figure out with the EMR team where the issue is coming from.
On Tue, Nov 27, 2018 at 6:29 AM Conrad Lee <[hidden email]> wrote:
>
> Dear spark community,
>
> I'm running spark 2.3.2 on EMR 5.19.0.  I've got a job that's hanging in the final stage--the job usually works, but I see this hanging behavior in about one out of 50 runs.
>
> The second-to-last stage sorts the dataframe, and the final stage writes the dataframe to HDFS.
>
> Here you can see the executor logs, which indicate that it has finished processing the task.
>
> Here you can see the thread dump from the executor that's hanging.  Here's the text of the blocked thread.
>
> I tried to work around this problem by enabling speculation, but speculative execution never takes place.  I don't know why.
>
> Can anyone here help me?
>
> Thanks,
> Conrad



--
Sent from my iPhone
Reply | Threaded
Open this post in threaded view
|

Re: Job hangs in blocked task in final parquet write stage

Christopher Petrino
If not, try running a coalesce. Your data may have grown and is defaulting to a number of partitions that causing unnecessary overhead

On Thu, Nov 29, 2018 at 3:02 AM Conrad Lee <[hidden email]> wrote:
Thanks, I'll try using 5.17.0.

For anyone trying to debug this problem in the future: In other jobs that hang in the same manner, the thread dump didn't have any blocked threads, so that might be a red herring.

On Wed, Nov 28, 2018 at 4:34 PM Christopher Petrino <[hidden email]> wrote:
I ran into problems using 5.19 so I referred to 5.17 and it resolved my issues. 

On Wed, Nov 28, 2018 at 2:48 AM Conrad Lee <[hidden email]> wrote:
Hello Vadim,

Interesting.  I've only been running this job at scale for a couple weeks so I can't say whether this is related to recent EMR changes.

Much of the EMR-specific code for spark has to do with writing files to s3.  In this case I'm writing files to the cluster's HDFS though so my sense is that this is a spark issue, not an EMR (but I'm not sure).

Conrad

On Tue, Nov 27, 2018 at 5:21 PM Vadim Semenov <[hidden email]> wrote:
Hey Conrad,

has it started happening recently?

We recently started having some sporadic problems with drivers on EMR
when it gets stuck, up until two weeks ago everything was fine.
We're trying to figure out with the EMR team where the issue is coming from.
On Tue, Nov 27, 2018 at 6:29 AM Conrad Lee <[hidden email]> wrote:
>
> Dear spark community,
>
> I'm running spark 2.3.2 on EMR 5.19.0.  I've got a job that's hanging in the final stage--the job usually works, but I see this hanging behavior in about one out of 50 runs.
>
> The second-to-last stage sorts the dataframe, and the final stage writes the dataframe to HDFS.
>
> Here you can see the executor logs, which indicate that it has finished processing the task.
>
> Here you can see the thread dump from the executor that's hanging.  Here's the text of the blocked thread.
>
> I tried to work around this problem by enabling speculation, but speculative execution never takes place.  I don't know why.
>
> Can anyone here help me?
>
> Thanks,
> Conrad



--
Sent from my iPhone
Reply | Threaded
Open this post in threaded view
|

Re: Job hangs in blocked task in final parquet write stage

conradlee
Yeah, probably increasing the memory or increasing the number of output partitions would help.  However increasing memory available to each executor would add expense.  I want to keep the number of partitions low so that each parquet file turns out to be around 128 mb, which is best practice for long-term storage and use with other systems like presto.

This feels like a bug due to the flakey nature of the failure -- also, usually when the memory gets too low the executor is killed or errors out and I get one of the typical Spark OOM error codes.  When I run the same job with the same resources sometimes this job succeeds, and sometimes it fails.

On Mon, Dec 3, 2018 at 5:19 PM Christopher Petrino <[hidden email]> wrote:
Depending on the size of your data set and how how many resources you have (num-executors, executor instances, number of nodes) I'm inclined to suspect that issue is related to reduction of partitions from thousands to 96; I could be misguided but given the details I have I would consider testing an approach to understand the behavior if the final stage operates at different number of partitions. 

On Mon, Dec 3, 2018 at 2:48 AM Conrad Lee <[hidden email]> wrote:
Thanks for the thoughts.  While the beginning of the job deals with lots of files in the first stage, they're first coalesced down into just a few thousand partitions.  The part of the job that's failing is the reduce-side of a dataframe.sort() that writes output to HDFS.  This last stage has only 96 tasks and the partitions are well balanced.  I'm not using a `partitionBy` option on the dataframe writer.

On Fri, Nov 30, 2018 at 8:14 PM Christopher Petrino <[hidden email]> wrote:
The reason I ask is because I've had some unreliability caused by over stressing the HDFS. Do you know the number of partitions when these actions are being. i.e. if you have 1,000,000 files being read you may have 1,000,000 partitions which may cause HDFS stress. Alternatively if you have 1 large file, say 100 GB, you may 1 partition which would not fit in memory and may cause writes to disk. I imagine it may be flaky because you are doing some action like a groupBy somewhere and depending on how the data was read certain groups will be in certain partitions; I'm not sure if reads on files are deterministic, I suspect they are not

On Fri, Nov 30, 2018 at 2:08 PM Conrad Lee <[hidden email]> wrote:
I'm loading the data using the dataframe reader from parquet files stored on local HDFS.  The stage of the job that fails is not the stage that does this.  The stage of the job that fails is one that reads a sorted dataframe from the last shuffle and performs the final write to parquet on local HDFS.

On Fri, Nov 30, 2018 at 4:02 PM Christopher Petrino <[hidden email]> wrote:
How are you loading the data?

On Fri, Nov 30, 2018 at 2:26 AM Conrad Lee <[hidden email]> wrote:
Thanks for the suggestions.  Here's an update that responds to some of the suggestions/ideas in-line:

I ran into problems using 5.19 so I referred to 5.17 and it resolved my issues.   

I tried EMR 5.17.0 and the problem still sometimes occurs.

 try running a coalesce. Your data may have grown and is defaulting to a number of partitions that causing unnecessary overhead
Well I don't think it's that because this problem occurs flakily.  That is, if the job hangs I can kill it and re-run it and it works fine (on the same hardware and with the same memory settings).  I'm not getting any OOM errors.

On a related note: the job is spilling to disk. I see messages like this:

18/11/29 21:40:06 INFO UnsafeExternalSorter: Thread 156 spilling sort data of 912.0 MB to disk (3  times so far)

 This occurs in both successful and unsuccessful runs though.  I've checked the disks of an executor that's running a hanging job and its disks have plenty of space, so it doesn't seem to be an out of disk space issue. This also doesn't seem to be where it hangs--the logs move on and describe the the parquet commit.

On Thu, Nov 29, 2018 at 4:06 PM Christopher Petrino <[hidden email]> wrote:
If not, try running a coalesce. Your data may have grown and is defaulting to a number of partitions that causing unnecessary overhead

On Thu, Nov 29, 2018 at 3:02 AM Conrad Lee <[hidden email]> wrote:
Thanks, I'll try using 5.17.0.

For anyone trying to debug this problem in the future: In other jobs that hang in the same manner, the thread dump didn't have any blocked threads, so that might be a red herring.

On Wed, Nov 28, 2018 at 4:34 PM Christopher Petrino <[hidden email]> wrote:
I ran into problems using 5.19 so I referred to 5.17 and it resolved my issues. 

On Wed, Nov 28, 2018 at 2:48 AM Conrad Lee <[hidden email]> wrote:
Hello Vadim,

Interesting.  I've only been running this job at scale for a couple weeks so I can't say whether this is related to recent EMR changes.

Much of the EMR-specific code for spark has to do with writing files to s3.  In this case I'm writing files to the cluster's HDFS though so my sense is that this is a spark issue, not an EMR (but I'm not sure).

Conrad

On Tue, Nov 27, 2018 at 5:21 PM Vadim Semenov <[hidden email]> wrote:
Hey Conrad,

has it started happening recently?

We recently started having some sporadic problems with drivers on EMR
when it gets stuck, up until two weeks ago everything was fine.
We're trying to figure out with the EMR team where the issue is coming from.
On Tue, Nov 27, 2018 at 6:29 AM Conrad Lee <[hidden email]> wrote:
>
> Dear spark community,
>
> I'm running spark 2.3.2 on EMR 5.19.0.  I've got a job that's hanging in the final stage--the job usually works, but I see this hanging behavior in about one out of 50 runs.
>
> The second-to-last stage sorts the dataframe, and the final stage writes the dataframe to HDFS.
>
> Here you can see the executor logs, which indicate that it has finished processing the task.
>
> Here you can see the thread dump from the executor that's hanging.  Here's the text of the blocked thread.
>
> I tried to work around this problem by enabling speculation, but speculative execution never takes place.  I don't know why.
>
> Can anyone here help me?
>
> Thanks,
> Conrad



--
Sent from my iPhone
Reply | Threaded
Open this post in threaded view
|

Re: Job hangs in blocked task in final parquet write stage

conradlee
So based on many more runs of this job I've come to the conclusion that a workaround to this error is to 
  • decrease the amount of data written in each partition, or
  • increase the amount of memory available to each executor
I still don't know what the root cause of the issue is.

On Tue, Dec 4, 2018 at 9:45 AM Conrad Lee <[hidden email]> wrote:
Yeah, probably increasing the memory or increasing the number of output partitions would help.  However increasing memory available to each executor would add expense.  I want to keep the number of partitions low so that each parquet file turns out to be around 128 mb, which is best practice for long-term storage and use with other systems like presto.

This feels like a bug due to the flakey nature of the failure -- also, usually when the memory gets too low the executor is killed or errors out and I get one of the typical Spark OOM error codes.  When I run the same job with the same resources sometimes this job succeeds, and sometimes it fails.

On Mon, Dec 3, 2018 at 5:19 PM Christopher Petrino <[hidden email]> wrote:
Depending on the size of your data set and how how many resources you have (num-executors, executor instances, number of nodes) I'm inclined to suspect that issue is related to reduction of partitions from thousands to 96; I could be misguided but given the details I have I would consider testing an approach to understand the behavior if the final stage operates at different number of partitions. 

On Mon, Dec 3, 2018 at 2:48 AM Conrad Lee <[hidden email]> wrote:
Thanks for the thoughts.  While the beginning of the job deals with lots of files in the first stage, they're first coalesced down into just a few thousand partitions.  The part of the job that's failing is the reduce-side of a dataframe.sort() that writes output to HDFS.  This last stage has only 96 tasks and the partitions are well balanced.  I'm not using a `partitionBy` option on the dataframe writer.

On Fri, Nov 30, 2018 at 8:14 PM Christopher Petrino <[hidden email]> wrote:
The reason I ask is because I've had some unreliability caused by over stressing the HDFS. Do you know the number of partitions when these actions are being. i.e. if you have 1,000,000 files being read you may have 1,000,000 partitions which may cause HDFS stress. Alternatively if you have 1 large file, say 100 GB, you may 1 partition which would not fit in memory and may cause writes to disk. I imagine it may be flaky because you are doing some action like a groupBy somewhere and depending on how the data was read certain groups will be in certain partitions; I'm not sure if reads on files are deterministic, I suspect they are not

On Fri, Nov 30, 2018 at 2:08 PM Conrad Lee <[hidden email]> wrote:
I'm loading the data using the dataframe reader from parquet files stored on local HDFS.  The stage of the job that fails is not the stage that does this.  The stage of the job that fails is one that reads a sorted dataframe from the last shuffle and performs the final write to parquet on local HDFS.

On Fri, Nov 30, 2018 at 4:02 PM Christopher Petrino <[hidden email]> wrote:
How are you loading the data?

On Fri, Nov 30, 2018 at 2:26 AM Conrad Lee <[hidden email]> wrote:
Thanks for the suggestions.  Here's an update that responds to some of the suggestions/ideas in-line:

I ran into problems using 5.19 so I referred to 5.17 and it resolved my issues.   

I tried EMR 5.17.0 and the problem still sometimes occurs.

 try running a coalesce. Your data may have grown and is defaulting to a number of partitions that causing unnecessary overhead
Well I don't think it's that because this problem occurs flakily.  That is, if the job hangs I can kill it and re-run it and it works fine (on the same hardware and with the same memory settings).  I'm not getting any OOM errors.

On a related note: the job is spilling to disk. I see messages like this:

18/11/29 21:40:06 INFO UnsafeExternalSorter: Thread 156 spilling sort data of 912.0 MB to disk (3  times so far)

 This occurs in both successful and unsuccessful runs though.  I've checked the disks of an executor that's running a hanging job and its disks have plenty of space, so it doesn't seem to be an out of disk space issue. This also doesn't seem to be where it hangs--the logs move on and describe the the parquet commit.

On Thu, Nov 29, 2018 at 4:06 PM Christopher Petrino <[hidden email]> wrote:
If not, try running a coalesce. Your data may have grown and is defaulting to a number of partitions that causing unnecessary overhead

On Thu, Nov 29, 2018 at 3:02 AM Conrad Lee <[hidden email]> wrote:
Thanks, I'll try using 5.17.0.

For anyone trying to debug this problem in the future: In other jobs that hang in the same manner, the thread dump didn't have any blocked threads, so that might be a red herring.

On Wed, Nov 28, 2018 at 4:34 PM Christopher Petrino <[hidden email]> wrote:
I ran into problems using 5.19 so I referred to 5.17 and it resolved my issues. 

On Wed, Nov 28, 2018 at 2:48 AM Conrad Lee <[hidden email]> wrote:
Hello Vadim,

Interesting.  I've only been running this job at scale for a couple weeks so I can't say whether this is related to recent EMR changes.

Much of the EMR-specific code for spark has to do with writing files to s3.  In this case I'm writing files to the cluster's HDFS though so my sense is that this is a spark issue, not an EMR (but I'm not sure).

Conrad

On Tue, Nov 27, 2018 at 5:21 PM Vadim Semenov <[hidden email]> wrote:
Hey Conrad,

has it started happening recently?

We recently started having some sporadic problems with drivers on EMR
when it gets stuck, up until two weeks ago everything was fine.
We're trying to figure out with the EMR team where the issue is coming from.
On Tue, Nov 27, 2018 at 6:29 AM Conrad Lee <[hidden email]> wrote:
>
> Dear spark community,
>
> I'm running spark 2.3.2 on EMR 5.19.0.  I've got a job that's hanging in the final stage--the job usually works, but I see this hanging behavior in about one out of 50 runs.
>
> The second-to-last stage sorts the dataframe, and the final stage writes the dataframe to HDFS.
>
> Here you can see the executor logs, which indicate that it has finished processing the task.
>
> Here you can see the thread dump from the executor that's hanging.  Here's the text of the blocked thread.
>
> I tried to work around this problem by enabling speculation, but speculative execution never takes place.  I don't know why.
>
> Can anyone here help me?
>
> Thanks,
> Conrad



--
Sent from my iPhone