Impact of .localCheckpoint() and executor dying

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Impact of .localCheckpoint() and executor dying

brettplarson
Hello,
I am wondering what the impact of using .localCheckpoint() and having the
executor die would be?

My understanding is that .localCheckpoint() breaks the lineage of the RDD
and this requires that the entire RDD to be rebuild instead of being able to
recompute lost partitions.

Does each executor store a copy of the entire RDD?

It's unclear to me the benefit of using Checkpoint over .localCheckpoint. (I
am aware that this is HDFS backed, but it's unclear the implications of
this)

Please let me know,
Thank you!




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Impact of .localCheckpoint() and executor dying

Jacek Laskowski
Hi,
 
> My understanding is that .localCheckpoint() breaks the lineage of the RDD

True.

> and this requires that the entire RDD to be rebuild instead of being able to recompute lost partitions.

In a sense, it's as if you saved the partitions to executors and re-read them back as source data (for this checkpointed RDD).

> Does each executor store a copy of the entire RDD?

No. An executor has got only the data of the partitions (for the tasks this executor has executed).

> Checkpoint over .localCheckpoint.

checkpoint is similar to localCheckpoint, but slower and reliable (as it's on a stable HDFS file system not on an ephemeral executor). In either case, the lineage should be the same = cut.

On Wed, Jan 6, 2021 at 6:15 PM brettplarson <[hidden email]> wrote:
Hello,
I am wondering what the impact of using .localCheckpoint() and having the
executor die would be?

My understanding is that .localCheckpoint() breaks the lineage of the RDD
and this requires that the entire RDD to be rebuild instead of being able to
recompute lost partitions.

Does each executor store a copy of the entire RDD?

It's unclear to me the benefit of using Checkpoint over .localCheckpoint. (I
am aware that this is HDFS backed, but it's unclear the implications of
this)

Please let me know,
Thank you!




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Impact of .localCheckpoint() and executor dying

brettplarson
Jacek,
Thanks for your response, I am still trying to understand the impact of an executor dying after a localCheckpoint is taken. 

Would the entire spark application fail in this case due to the broken lineage? Or would the jobs associated with that executor need to be re-computed from scratch?

Thank you!


On Wed, Jan 6, 2021 at 1:09 PM Jacek Laskowski <[hidden email]> wrote:
Hi,
 
> My understanding is that .localCheckpoint() breaks the lineage of the RDD

True.

> and this requires that the entire RDD to be rebuild instead of being able to recompute lost partitions.

In a sense, it's as if you saved the partitions to executors and re-read them back as source data (for this checkpointed RDD).

> Does each executor store a copy of the entire RDD?

No. An executor has got only the data of the partitions (for the tasks this executor has executed).

> Checkpoint over .localCheckpoint.

checkpoint is similar to localCheckpoint, but slower and reliable (as it's on a stable HDFS file system not on an ephemeral executor). In either case, the lineage should be the same = cut.

On Wed, Jan 6, 2021 at 6:15 PM brettplarson <[hidden email]> wrote:
Hello,
I am wondering what the impact of using .localCheckpoint() and having the
executor die would be?

My understanding is that .localCheckpoint() breaks the lineage of the RDD
and this requires that the entire RDD to be rebuild instead of being able to
recompute lost partitions.

Does each executor store a copy of the entire RDD?

It's unclear to me the benefit of using Checkpoint over .localCheckpoint. (I
am aware that this is HDFS backed, but it's unclear the implications of
this)

Please let me know,
Thank you!




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--
Brett Larson 
[hidden email] / 847321200
Reply | Threaded
Open this post in threaded view
|

Re: Impact of .localCheckpoint() and executor dying

Jacek Laskowski
Hi,

> impact of an executor dying after a localCheckpoint is taken.

My memory is a bit vague on this, but I'd not be surprised if this localCheckpoint-ed RDD would be "broken" and any actions would simply throw an exception like missing partitions or similar. There's no way back.

I wish myself that someone with more skills in this area chimed in...

On Wed, Jan 6, 2021 at 8:30 PM Brett Larson <[hidden email]> wrote:
Jacek,
Thanks for your response, I am still trying to understand the impact of an executor dying after a localCheckpoint is taken. 

Would the entire spark application fail in this case due to the broken lineage? Or would the jobs associated with that executor need to be re-computed from scratch?

Thank you!


On Wed, Jan 6, 2021 at 1:09 PM Jacek Laskowski <[hidden email]> wrote:
Hi,
 
> My understanding is that .localCheckpoint() breaks the lineage of the RDD

True.

> and this requires that the entire RDD to be rebuild instead of being able to recompute lost partitions.

In a sense, it's as if you saved the partitions to executors and re-read them back as source data (for this checkpointed RDD).

> Does each executor store a copy of the entire RDD?

No. An executor has got only the data of the partitions (for the tasks this executor has executed).

> Checkpoint over .localCheckpoint.

checkpoint is similar to localCheckpoint, but slower and reliable (as it's on a stable HDFS file system not on an ephemeral executor). In either case, the lineage should be the same = cut.

On Wed, Jan 6, 2021 at 6:15 PM brettplarson <[hidden email]> wrote:
Hello,
I am wondering what the impact of using .localCheckpoint() and having the
executor die would be?

My understanding is that .localCheckpoint() breaks the lineage of the RDD
and this requires that the entire RDD to be rebuild instead of being able to
recompute lost partitions.

Does each executor store a copy of the entire RDD?

It's unclear to me the benefit of using Checkpoint over .localCheckpoint. (I
am aware that this is HDFS backed, but it's unclear the implications of
this)

Please let me know,
Thank you!




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



--
Brett Larson 
[hidden email] / 847321200