Accumulator guarantees

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Accumulator guarantees

Sergey Zhemzhitsky
Hi there,

Although Spark's docs state that there is a guarantee that
- accumulators in actions will only be updated once
- accumulators in transformations may be updated multiple times

... I'm wondering whether the same is true for transformations in the
last stage of the job or there is a guarantee that accumulators
running in transformations of the last stage are guaranteed to be
updated once too?

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Accumulator guarantees

Sergey Zhemzhitsky
As far as I understand updates of the custom accumulators at the
driver side happen during task completion [1].
The documentation states [2] that the very last stage in a job
consists of multiple ResultTasks, which execute the task and send its
output back to the driver application.
Also sources prove [3] that accumulators are updated for the
ResultTasks just once.

So it's seems that accumulators are safe to use and will be executed
only once regardless of where they are used (transformations as well
as actions) in case these transformations and actions are belong to
the last stage. Is it correct?
Could anyone of Spark commiters or contributors please confirm it?


[1] https://github.com/apache/spark/blob/3990daaf3b6ca2c5a9f7790030096262efb12cb2/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1194
[2] https://github.com/apache/spark/blob/a6fc300e91273230e7134ac6db95ccb4436c6f8f/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L36
[3] https://github.com/apache/spark/blob/3990daaf3b6ca2c5a9f7790030096262efb12cb2/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1204

On Thu, May 10, 2018 at 10:24 PM, Sergey Zhemzhitsky <[hidden email]> wrote:

> Hi there,
>
> Although Spark's docs state that there is a guarantee that
> - accumulators in actions will only be updated once
> - accumulators in transformations may be updated multiple times
>
> ... I'm wondering whether the same is true for transformations in the
> last stage of the job or there is a guarantee that accumulators
> running in transformations of the last stage are guaranteed to be
> updated once too?

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]