Connecting an Application to the Cluster

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Connecting an Application to the Cluster

David Thomas
From docs:
Connecting an Application to the Cluster

To run an application on the Spark cluster, simply pass the spark://IP:PORT URL of the master as to the SparkContext constructor.


Could someone enlighten me on what happens if I run the app, from say, Eclipse on my local machine, but use the url of the master node which is on cloud. What role does my local JVM play then?
Reply | Threaded
Open this post in threaded view
|

Re: Connecting an Application to the Cluster

Nhan Vu Lam Chi
Your local app will be called "driver program", which creates jobs and submits them to the cluster for running.


On Mon, Feb 17, 2014 at 9:19 AM, David Thomas <[hidden email]> wrote:
From docs:
Connecting an Application to the Cluster

To run an application on the Spark cluster, simply pass the spark://IP:PORT URL of the master as to the SparkContext constructor.


Could someone enlighten me on what happens if I run the app, from say, Eclipse on my local machine, but use the url of the master node which is on cloud. What role does my local JVM play then?

Reply | Threaded
Open this post in threaded view
|

Re: Connecting an Application to the Cluster

David Thomas
Where is the SparkContext object created then? On my local machine or on the master node in the cluster?


On Mon, Feb 17, 2014 at 4:17 AM, Nhan Vu Lam Chi <[hidden email]> wrote:
Your local app will be called "driver program", which creates jobs and submits them to the cluster for running.


On Mon, Feb 17, 2014 at 9:19 AM, David Thomas <[hidden email]> wrote:
From docs:
Connecting an Application to the Cluster

To run an application on the Spark cluster, simply pass the spark://IP:PORT URL of the master as to the SparkContext constructor.


Could someone enlighten me on what happens if I run the app, from say, Eclipse on my local machine, but use the url of the master node which is on cloud. What role does my local JVM play then?


Reply | Threaded
Open this post in threaded view
|

Re: Connecting an Application to the Cluster

purav aggarwal
Your local machine simply submits your job (in the form of jar) to the cluster.
The master node is where the SparkContext object is created, a DAG of your job is formed and tasks (stages) are assigned to different workers - which are not aware of anything but computation of task being assigned.


On Mon, Feb 17, 2014 at 10:07 PM, David Thomas <[hidden email]> wrote:
Where is the SparkContext object created then? On my local machine or on the master node in the cluster?


On Mon, Feb 17, 2014 at 4:17 AM, Nhan Vu Lam Chi <[hidden email]> wrote:
Your local app will be called "driver program", which creates jobs and submits them to the cluster for running.


On Mon, Feb 17, 2014 at 9:19 AM, David Thomas <[hidden email]> wrote:
From docs:
Connecting an Application to the Cluster

To run an application on the Spark cluster, simply pass the spark://IP:PORT URL of the master as to the SparkContext constructor.


Could someone enlighten me on what happens if I run the app, from say, Eclipse on my local machine, but use the url of the master node which is on cloud. What role does my local JVM play then?



Reply | Threaded
Open this post in threaded view
|

Re: Connecting an Application to the Cluster

David Thomas
So if I do a spark action, say, collect, will I be able to see the result on my local console? Or would it be only available only on the cluster master?


On Mon, Feb 17, 2014 at 9:50 AM, purav aggarwal <[hidden email]> wrote:
Your local machine simply submits your job (in the form of jar) to the cluster.
The master node is where the SparkContext object is created, a DAG of your job is formed and tasks (stages) are assigned to different workers - which are not aware of anything but computation of task being assigned.


On Mon, Feb 17, 2014 at 10:07 PM, David Thomas <[hidden email]> wrote:
Where is the SparkContext object created then? On my local machine or on the master node in the cluster?


On Mon, Feb 17, 2014 at 4:17 AM, Nhan Vu Lam Chi <[hidden email]> wrote:
Your local app will be called "driver program", which creates jobs and submits them to the cluster for running.


On Mon, Feb 17, 2014 at 9:19 AM, David Thomas <[hidden email]> wrote:
From docs:
Connecting an Application to the Cluster

To run an application on the Spark cluster, simply pass the spark://IP:PORT URL of the master as to the SparkContext constructor.


Could someone enlighten me on what happens if I run the app, from say, Eclipse on my local machine, but use the url of the master node which is on cloud. What role does my local JVM play then?




Reply | Threaded
Open this post in threaded view
|

Re: Connecting an Application to the Cluster

purav aggarwal
The data would get aggregated on the master node.
Since the JVM for the application is invoked from your local machine (spark driver) I think you might be able to print it on your console.


On Mon, Feb 17, 2014 at 10:24 PM, David Thomas <[hidden email]> wrote:
So if I do a spark action, say, collect, will I be able to see the result on my local console? Or would it be only available only on the cluster master?


On Mon, Feb 17, 2014 at 9:50 AM, purav aggarwal <[hidden email]> wrote:
Your local machine simply submits your job (in the form of jar) to the cluster.
The master node is where the SparkContext object is created, a DAG of your job is formed and tasks (stages) are assigned to different workers - which are not aware of anything but computation of task being assigned.


On Mon, Feb 17, 2014 at 10:07 PM, David Thomas <[hidden email]> wrote:
Where is the SparkContext object created then? On my local machine or on the master node in the cluster?


On Mon, Feb 17, 2014 at 4:17 AM, Nhan Vu Lam Chi <[hidden email]> wrote:
Your local app will be called "driver program", which creates jobs and submits them to the cluster for running.


On Mon, Feb 17, 2014 at 9:19 AM, David Thomas <[hidden email]> wrote:
From docs:
Connecting an Application to the Cluster

To run an application on the Spark cluster, simply pass the spark://IP:PORT URL of the master as to the SparkContext constructor.


Could someone enlighten me on what happens if I run the app, from say, Eclipse on my local machine, but use the url of the master node which is on cloud. What role does my local JVM play then?





Reply | Threaded
Open this post in threaded view
|

Re: Connecting an Application to the Cluster

Christopher Nguyen
In reply to this post by David Thomas
David, actually, it's the driver that "creates" and holds a reference to the SparkContext. The master in this context is only a resource manager providing information about the cluster, being aware of where workers are, how many there are, etc.

The SparkContext object can get serialized/deserialized and instantiated/made available elsewhere (e.g., on the worker nodes), but this is being overly precise and doesn't apply directly to the question you're asking.

So yes, if you do collect(), you will be able to see the results on your local console.
--
Christopher T. Nguyen
Co-founder & CEO, Adatao


On Mon, Feb 17, 2014 at 8:54 AM, David Thomas <[hidden email]> wrote:
So if I do a spark action, say, collect, will I be able to see the result on my local console? Or would it be only available only on the cluster master?


On Mon, Feb 17, 2014 at 9:50 AM, purav aggarwal <[hidden email]> wrote:
Your local machine simply submits your job (in the form of jar) to the cluster.
The master node is where the SparkContext object is created, a DAG of your job is formed and tasks (stages) are assigned to different workers - which are not aware of anything but computation of task being assigned.


On Mon, Feb 17, 2014 at 10:07 PM, David Thomas <[hidden email]> wrote:
Where is the SparkContext object created then? On my local machine or on the master node in the cluster?


On Mon, Feb 17, 2014 at 4:17 AM, Nhan Vu Lam Chi <[hidden email]> wrote:
Your local app will be called "driver program", which creates jobs and submits them to the cluster for running.


On Mon, Feb 17, 2014 at 9:19 AM, David Thomas <[hidden email]> wrote:
From docs:
Connecting an Application to the Cluster

To run an application on the Spark cluster, simply pass the spark://IP:PORT URL of the master as to the SparkContext constructor.


Could someone enlighten me on what happens if I run the app, from say, Eclipse on my local machine, but use the url of the master node which is on cloud. What role does my local JVM play then?





Reply | Threaded
Open this post in threaded view
|

Re: Connecting an Application to the Cluster

Michael (Bach) Bui
In reply to this post by David Thomas
Spark has the concept of  Driver and Master

Driver is your the spark program that you run in your local machine. SparkContext resides in the driver together with the DAG scheduler.
Master is responsible for managing cluster resources, e.g. giving the Driver the workers that it needed. The Master can be either Mesos master (for Mesos cluster), or Spark master (for Spark standalone cluster), or ResourceManager (for Hadoop cluster)
Given the resources assigned by Master, Driver will user DAG to assign tasks to workers.

So yes, the result of spark's actions will be sent back to driver, which is your local console.


On Feb 17, 2014, at 10:54 AM, David Thomas <[hidden email]> wrote:

So if I do a spark action, say, collect, will I be able to see the result on my local console? Or would it be only available only on the cluster master?


On Mon, Feb 17, 2014 at 9:50 AM, purav aggarwal <[hidden email]> wrote:
Your local machine simply submits your job (in the form of jar) to the cluster.
The master node is where the SparkContext object is created, a DAG of your job is formed and tasks (stages) are assigned to different workers - which are not aware of anything but computation of task being assigned.


On Mon, Feb 17, 2014 at 10:07 PM, David Thomas <[hidden email]> wrote:
Where is the SparkContext object created then? On my local machine or on the master node in the cluster?


On Mon, Feb 17, 2014 at 4:17 AM, Nhan Vu Lam Chi <[hidden email]> wrote:
Your local app will be called "driver program", which creates jobs and submits them to the cluster for running.


On Mon, Feb 17, 2014 at 9:19 AM, David Thomas <[hidden email]> wrote:
From docs:
Connecting an Application to the Cluster

To run an application on the Spark cluster, simply pass the <a href="spark://IP:PORT">spark://IP:PORT URL of the master as to the SparkContext constructor.


Could someone enlighten me on what happens if I run the app, from say, Eclipse on my local machine, but use the url of the master node which is on cloud. What role does my local JVM play then?





Reply | Threaded
Open this post in threaded view
|

Re: Connecting an Application to the Cluster

purav aggarwal
Sorry for the incorrect information. Where can I pick up these architectural/design concepts for Spark?
I seem to have misunderstood the responsibilities of the master and the driver.


On Mon, Feb 17, 2014 at 10:51 PM, Michael (Bach) Bui <[hidden email]> wrote:
Spark has the concept of  Driver and Master

Driver is your the spark program that you run in your local machine. SparkContext resides in the driver together with the DAG scheduler.
Master is responsible for managing cluster resources, e.g. giving the Driver the workers that it needed. The Master can be either Mesos master (for Mesos cluster), or Spark master (for Spark standalone cluster), or ResourceManager (for Hadoop cluster)
Given the resources assigned by Master, Driver will user DAG to assign tasks to workers.

So yes, the result of spark's actions will be sent back to driver, which is your local console.


On Feb 17, 2014, at 10:54 AM, David Thomas <[hidden email]> wrote:

So if I do a spark action, say, collect, will I be able to see the result on my local console? Or would it be only available only on the cluster master?


On Mon, Feb 17, 2014 at 9:50 AM, purav aggarwal <[hidden email]> wrote:
Your local machine simply submits your job (in the form of jar) to the cluster.
The master node is where the SparkContext object is created, a DAG of your job is formed and tasks (stages) are assigned to different workers - which are not aware of anything but computation of task being assigned.


On Mon, Feb 17, 2014 at 10:07 PM, David Thomas <[hidden email]> wrote:
Where is the SparkContext object created then? On my local machine or on the master node in the cluster?


On Mon, Feb 17, 2014 at 4:17 AM, Nhan Vu Lam Chi <[hidden email]> wrote:
Your local app will be called "driver program", which creates jobs and submits them to the cluster for running.


On Mon, Feb 17, 2014 at 9:19 AM, David Thomas <[hidden email]> wrote:
From docs:
Connecting an Application to the Cluster

To run an application on the Spark cluster, simply pass the spark://IP:PORT URL of the master as to the SparkContext constructor.


Could someone enlighten me on what happens if I run the app, from say, Eclipse on my local machine, but use the url of the master node which is on cloud. What role does my local JVM play then?






Reply | Threaded
Open this post in threaded view
|

Re: Connecting an Application to the Cluster

David Thomas
In reply to this post by Michael (Bach) Bui
Thanks everyone, it all makes sense now.


On Mon, Feb 17, 2014 at 10:21 AM, Michael (Bach) Bui <[hidden email]> wrote:
Spark has the concept of  Driver and Master

Driver is your the spark program that you run in your local machine. SparkContext resides in the driver together with the DAG scheduler.
Master is responsible for managing cluster resources, e.g. giving the Driver the workers that it needed. The Master can be either Mesos master (for Mesos cluster), or Spark master (for Spark standalone cluster), or ResourceManager (for Hadoop cluster)
Given the resources assigned by Master, Driver will user DAG to assign tasks to workers.

So yes, the result of spark's actions will be sent back to driver, which is your local console.


On Feb 17, 2014, at 10:54 AM, David Thomas <[hidden email]> wrote:

So if I do a spark action, say, collect, will I be able to see the result on my local console? Or would it be only available only on the cluster master?


On Mon, Feb 17, 2014 at 9:50 AM, purav aggarwal <[hidden email]> wrote:
Your local machine simply submits your job (in the form of jar) to the cluster.
The master node is where the SparkContext object is created, a DAG of your job is formed and tasks (stages) are assigned to different workers - which are not aware of anything but computation of task being assigned.


On Mon, Feb 17, 2014 at 10:07 PM, David Thomas <[hidden email]> wrote:
Where is the SparkContext object created then? On my local machine or on the master node in the cluster?


On Mon, Feb 17, 2014 at 4:17 AM, Nhan Vu Lam Chi <[hidden email]> wrote:
Your local app will be called "driver program", which creates jobs and submits them to the cluster for running.


On Mon, Feb 17, 2014 at 9:19 AM, David Thomas <[hidden email]> wrote:
From docs:
Connecting an Application to the Cluster

To run an application on the Spark cluster, simply pass the spark://IP:PORT URL of the master as to the SparkContext constructor.


Could someone enlighten me on what happens if I run the app, from say, Eclipse on my local machine, but use the url of the master node which is on cloud. What role does my local JVM play then?






Reply | Threaded
Open this post in threaded view
|

Re: Connecting an Application to the Cluster

Michael (Bach) Bui
In reply to this post by purav aggarwal
It used to be that you have to read Spark code to figure this information out.
However, Spark team has recently published this info here: http://spark.incubator.apache.org/docs/latest/cluster-overview.html






On Feb 17, 2014, at 11:35 AM, purav aggarwal <[hidden email]> wrote:

Sorry for the incorrect information. Where can I pick up these architectural/design concepts for Spark?
I seem to have misunderstood the responsibilities of the master and the driver.


On Mon, Feb 17, 2014 at 10:51 PM, Michael (Bach) Bui <[hidden email]> wrote:
Spark has the concept of  Driver and Master

Driver is your the spark program that you run in your local machine. SparkContext resides in the driver together with the DAG scheduler.
Master is responsible for managing cluster resources, e.g. giving the Driver the workers that it needed. The Master can be either Mesos master (for Mesos cluster), or Spark master (for Spark standalone cluster), or ResourceManager (for Hadoop cluster)
Given the resources assigned by Master, Driver will user DAG to assign tasks to workers.

So yes, the result of spark's actions will be sent back to driver, which is your local console.


On Feb 17, 2014, at 10:54 AM, David Thomas <[hidden email]> wrote:

So if I do a spark action, say, collect, will I be able to see the result on my local console? Or would it be only available only on the cluster master?


On Mon, Feb 17, 2014 at 9:50 AM, purav aggarwal <[hidden email]> wrote:
Your local machine simply submits your job (in the form of jar) to the cluster.
The master node is where the SparkContext object is created, a DAG of your job is formed and tasks (stages) are assigned to different workers - which are not aware of anything but computation of task being assigned.


On Mon, Feb 17, 2014 at 10:07 PM, David Thomas <[hidden email]> wrote:
Where is the SparkContext object created then? On my local machine or on the master node in the cluster?


On Mon, Feb 17, 2014 at 4:17 AM, Nhan Vu Lam Chi <[hidden email]> wrote:
Your local app will be called "driver program", which creates jobs and submits them to the cluster for running.


On Mon, Feb 17, 2014 at 9:19 AM, David Thomas <[hidden email]> wrote:
From docs:
Connecting an Application to the Cluster

To run an application on the Spark cluster, simply pass the spark://IP:PORT URL of the master as to the SparkContext constructor.


Could someone enlighten me on what happens if I run the app, from say, Eclipse on my local machine, but use the url of the master node which is on cloud. What role does my local JVM play then?