Understanding life cycle of RpcEndpoint: CoarseGrainedExecutorBackend

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Understanding life cycle of RpcEndpoint: CoarseGrainedExecutorBackend


I am trying to understand the lifecycle of an RPCEndpoint.

Here is my understanding: After negotiating containers form the ClusterManager, the master starts the CoarseGrainedExecutorBackend on the worker which connects back to the CoarseGrainedSchedulerBackend's DriverEndpoint which sends requests/messages to the CoarseGrainedExecutorBackend.

Q1: My inference is the lifecyle of CoarseGrainedExecutorBackend is: onConnected() -> onStart() -> receive -> onStop(). The receive() method keeps taking the requests/messages and executing them meaning that the receive() method is called multiple times throughout its lifecycle. Is my understanding right?

Q2: The receive method executes "messages/requests" as per the source code. What exactly are these messages/requests? Is it referring to the "set of tasks on assigned to this particular RPCEndpoint" from a stage of a spark RDD on its individual partitions?

Q3: If the receive method is indeed called multiple times through the course of a spark job where each request refers to the set of task(s) of a stage, then does this mean a new Executor is instantiated when the receive() method is called (as the code suggests in line 129) which in turn happens every time a stage is executed and a set of tasks are sent to a particular RPCEndpoint (CoarseGrainedExecutorBackend) after shutting down the executor from the previous stage?

I have put this question up on SO as well @ https://stackoverflow.com/questions/59388700/understanding-the-lifecycle-of-and-rpcendpoint-coarsegrainedexecutorbackend

It would be a lot of help if one could elaborate and shed light on these questions.