RE: how to detect a disconnect

Previous Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

RE: how to detect a disconnect

Shao, Saisai

Hi Koert,


Seems currently there is no API you can use to detect the block manager lost. This mainly caused by Full GC or some others that block the communications between client driver and executors’ block manager, when the executors recovered from block, they will re-register themselves to client driver, so for users there’s need to take special steps to recover. Also you can set “” to large value to avoid this warning, default is “60000”.






From: Koert Kuipers [mailto:[hidden email]]
Sent: Sunday, December 22, 2013 2:05 AM
To: [hidden email]
Subject: how to detect a disconnect


with long running apps i see this at times:

13/12/21 12:57:59 INFO scheduler.Stage: Stage 1 is now unavailable on executor 10 (0/66, false)
13/12/21 12:58:19 WARN storage.BlockManagerMasterActor: Removing BlockManager BlockManagerId(1, node10, 33734, 0) with no recent heart beats: 50227ms exceeds 45000ms

typically this would be because of a spark service restart. is there a way to detect this programmatically so that the client can take the correct steps to recover?