Where is the DAG stored before catalyst gets it?

classic Classic list List threaded Threaded
2 messages Options
jgp
Reply | Threaded
Open this post in threaded view
|

Where is the DAG stored before catalyst gets it?

jgp
Hi,

I am assuming it is still in the master and when catalyst is finished it sends the tasks to the workers.

Correct?

tia

jg
---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Where is the DAG stored before catalyst gets it?

Jacek Laskowski
Hi Jean Georges,

> I am assuming it is still in the master and when catalyst is finished it sends the tasks to the workers.

Sorry to be that direct, but the sentence does not make much sense to me. Again, very sorry for saying it in the very first sentence. Since I know Jean Georges I allowed myself for more openness.

In other words, "the master" part seems to suggest that you use Spark Standalone cluster. Correct? Other cluster use different naming for the master/manager node.

"when catalyst is finished" that one is really tough to understand. You mean once all the optimizations are applied and the query is ready for execution? The final output of the "query execution pipeline" is to generate a RDD with the right code for execution. At this phase, the query is more an RDD than a Dataset.

"it sends the tasks to the workers." since we're talking about an RDD, this abstraction is planned as a set of tasks (one per partition of the RDD). And yes, the tasks are sent out over the wire to executors. It's been like this from Spark 1.0 (and even earlier).

Hope I helped a bit.

On Fri, Oct 5, 2018 at 12:36 AM Jean Georges Perrin <[hidden email]> wrote:
Hi,

I am assuming it is still in the master and when catalyst is finished it sends the tasks to the workers.

Correct?

tia

jg
---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]