Having spark-ec2 join new slaves to existing cluster

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Having spark-ec2 join new slaves to existing cluster

Nick Chammas
I would like to be able to use spark-ec2 to launch new slaves and add them to an existing, running cluster. Similarly, I would also like to remove slaves from an existing cluster.

Use cases include:
  1. Oh snap, I sized my cluster incorrectly. Let me add/remove some slaves.
  2. During scheduled batch processing, I want to add some new slaves, perhaps on spot instances. When that processing is done, I want to kill them. (Cruel, I know.)
I gather this is not possible at the moment. spark-ec2 appears to be able to launch new slaves for an existing cluster only if the master is stopped. I also do not see any ability to remove slaves from a cluster.

Is that correct? Are there plans to add such functionality to spark-ec2 in the future?

Nick

Reply | Threaded
Open this post in threaded view
|

Re: Having spark-ec2 join new slaves to existing cluster

Matei Zaharia
Administrator
This can’t be done through the script right now, but you can do it manually as long as the cluster is stopped. If the cluster is stopped, just go into the AWS Console, right click a slave and choose “launch more of these” to add more. Or select multiple slaves and delete them. When you run spark-ec2 start the next time to start your cluster, it will set it up on all the machines it finds in the mycluster-slaves security group.

This is pretty hacky so it would definitely be good to add this feature; feel free to open a JIRA about it.

Matei

On Apr 4, 2014, at 12:16 PM, Nicholas Chammas <[hidden email]> wrote:

I would like to be able to use spark-ec2 to launch new slaves and add them to an existing, running cluster. Similarly, I would also like to remove slaves from an existing cluster.

Use cases include:
  1. Oh snap, I sized my cluster incorrectly. Let me add/remove some slaves.
  2. During scheduled batch processing, I want to add some new slaves, perhaps on spot instances. When that processing is done, I want to kill them. (Cruel, I know.)
I gather this is not possible at the moment. spark-ec2 appears to be able to launch new slaves for an existing cluster only if the master is stopped. I also do not see any ability to remove slaves from a cluster.

Is that correct? Are there plans to add such functionality to spark-ec2 in the future?

Nick



View this message in context: Having spark-ec2 join new slaves to existing cluster
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Having spark-ec2 join new slaves to existing cluster

Nick Chammas
Sweet, thanks for the instructions. This will do for resizing a dev cluster that you can bring down at will.

I will open a JIRA issue about adding the functionality I described to spark-ec2.


On Fri, Apr 4, 2014 at 3:43 PM, Matei Zaharia <[hidden email]> wrote:
This can’t be done through the script right now, but you can do it manually as long as the cluster is stopped. If the cluster is stopped, just go into the AWS Console, right click a slave and choose “launch more of these” to add more. Or select multiple slaves and delete them. When you run spark-ec2 start the next time to start your cluster, it will set it up on all the machines it finds in the mycluster-slaves security group.

This is pretty hacky so it would definitely be good to add this feature; feel free to open a JIRA about it.

Matei

On Apr 4, 2014, at 12:16 PM, Nicholas Chammas <[hidden email]> wrote:

I would like to be able to use spark-ec2 to launch new slaves and add them to an existing, running cluster. Similarly, I would also like to remove slaves from an existing cluster.

Use cases include:
  1. Oh snap, I sized my cluster incorrectly. Let me add/remove some slaves.
  2. During scheduled batch processing, I want to add some new slaves, perhaps on spot instances. When that processing is done, I want to kill them. (Cruel, I know.)
I gather this is not possible at the moment. spark-ec2 appears to be able to launch new slaves for an existing cluster only if the master is stopped. I also do not see any ability to remove slaves from a cluster.

Is that correct? Are there plans to add such functionality to spark-ec2 in the future?

Nick




Reply | Threaded
Open this post in threaded view
|

Re: Having spark-ec2 join new slaves to existing cluster

Rafal Kwasny
Hi,
This will work nicely unless you're using spot instances, in this case the "start" does not work as slaves are lost on shutdown.
I feel like spark-ec2 script need a major refactor to cope with new features/more users using it in dynamic environments.
Are there any current plans to migrate it to CDH5 (just released) based install?

/Raf

Nicholas Chammas wrote:
Sweet, thanks for the instructions. This will do for resizing a dev cluster that you can bring down at will.

I will open a JIRA issue about adding the functionality I described to spark-ec2.


On Fri, Apr 4, 2014 at 3:43 PM, Matei Zaharia <[hidden email]> wrote:
This can’t be done through the script right now, but you can do it manually as long as the cluster is stopped. If the cluster is stopped, just go into the AWS Console, right click a slave and choose “launch more of these” to add more. Or select multiple slaves and delete them. When you run spark-ec2 start the next time to start your cluster, it will set it up on all the machines it finds in the mycluster-slaves security group.

This is pretty hacky so it would definitely be good to add this feature; feel free to open a JIRA about it.

Matei

On Apr 4, 2014, at 12:16 PM, Nicholas Chammas <[hidden email]> wrote:

I would like to be able to use spark-ec2 to launch new slaves and add them to an existing, running cluster. Similarly, I would also like to remove slaves from an existing cluster.

Use cases include:
  1. Oh snap, I sized my cluster incorrectly. Let me add/remove some slaves.
  2. During scheduled batch processing, I want to add some new slaves, perhaps on spot instances. When that processing is done, I want to kill them. (Cruel, I know.)
I gather this is not possible at the moment. spark-ec2 appears to be able to launch new slaves for an existing cluster only if the master is stopped. I also do not see any ability to remove slaves from a cluster.

Is that correct? Are there plans to add such functionality to spark-ec2 in the future?

Nick





Reply | Threaded
Open this post in threaded view
|

Re: Having spark-ec2 join new slaves to existing cluster

Arpit Tak
Hi all,

If the cluster is running  and I want to add slaves to existing cluster , which is the best way of doing it:
1.) As Matei said, select slave launch more of these
2.) Create a AMI of it and launch more of it like these .

The plus point of first is that its faster , but I have to rync everything , including ganglia services, passwordless-login etc... (although simple script will take care of this. )..what generally I do...

With AMI, it takes cares of everything , just had to add slaves in conf files.

What i think if need to add number of slaves > 15 than go for AMI , instead of rync ......

What your suggestions here ??

Regards,
Arpit


On Sun, Apr 6, 2014 at 10:26 PM, Rafal Kwasny <[hidden email]> wrote:
Hi,
This will work nicely unless you're using spot instances, in this case the "start" does not work as slaves are lost on shutdown.
I feel like spark-ec2 script need a major refactor to cope with new features/more users using it in dynamic environments.
Are there any current plans to migrate it to CDH5 (just released) based install?

/Raf


Nicholas Chammas wrote:
Sweet, thanks for the instructions. This will do for resizing a dev cluster that you can bring down at will.

I will open a JIRA issue about adding the functionality I described to spark-ec2.


On Fri, Apr 4, 2014 at 3:43 PM, Matei Zaharia <[hidden email]> wrote:
This can’t be done through the script right now, but you can do it manually as long as the cluster is stopped. If the cluster is stopped, just go into the AWS Console, right click a slave and choose “launch more of these” to add more. Or select multiple slaves and delete them. When you run spark-ec2 start the next time to start your cluster, it will set it up on all the machines it finds in the mycluster-slaves security group.

This is pretty hacky so it would definitely be good to add this feature; feel free to open a JIRA about it.

Matei

On Apr 4, 2014, at 12:16 PM, Nicholas Chammas <[hidden email]> wrote:

I would like to be able to use spark-ec2 to launch new slaves and add them to an existing, running cluster. Similarly, I would also like to remove slaves from an existing cluster.

Use cases include:
  1. Oh snap, I sized my cluster incorrectly. Let me add/remove some slaves.
  2. During scheduled batch processing, I want to add some new slaves, perhaps on spot instances. When that processing is done, I want to kill them. (Cruel, I know.)
I gather this is not possible at the moment. spark-ec2 appears to be able to launch new slaves for an existing cluster only if the master is stopped. I also do not see any ability to remove slaves from a cluster.

Is that correct? Are there plans to add such functionality to spark-ec2 in the future?

Nick






Reply | Threaded
Open this post in threaded view
|

Re: Having spark-ec2 join new slaves to existing cluster

sirisha_devineni
In reply to this post by Nick Chammas
Nick,

Did you open a JIRA ticket for this feature to be implemented in spark-ec2? If so can you please point me to the ticket?

I would also like to do the autoscaling of spark nodes(add/remove slave nodes). So curious to know how did you acheive this.

Sirisha
Nick Chammas wrote
Sweet, thanks for the instructions. This will do for resizing a dev cluster
that you can bring down at will.

I will open a JIRA issue about adding the functionality I described to
spark-ec2.


On Fri, Apr 4, 2014 at 3:43 PM, Matei Zaharia <[hidden email]>wrote:

> This can't be done through the script right now, but you can do it
> manually as long as the cluster is stopped. If the cluster is stopped, just
> go into the AWS Console, right click a slave and choose "launch more of
> these" to add more. Or select multiple slaves and delete them. When you run
> spark-ec2 start the next time to start your cluster, it will set it up on
> all the machines it finds in the mycluster-slaves security group.
>
> This is pretty hacky so it would definitely be good to add this feature;
> feel free to open a JIRA about it.
>
> Matei
>
> On Apr 4, 2014, at 12:16 PM, Nicholas Chammas <[hidden email]>
> wrote:
>
> I would like to be able to use spark-ec2 to launch new slaves and add them
> to an existing, running cluster. Similarly, I would also like to remove
> slaves from an existing cluster.
>
> Use cases include:
>
>    1. Oh snap, I sized my cluster incorrectly. Let me add/remove some
>    slaves.
>    2. During scheduled batch processing, I want to add some new slaves,
>    perhaps on spot instances. When that processing is done, I want to kill
>    them. (Cruel, I know.)
>
> I gather this is not possible at the moment. spark-ec2 appears to be able
> to launch new slaves for an existing cluster only if the master is stopped.
> I also do not see any ability to remove slaves from a cluster.
>
> Is that correct? Are there plans to add such functionality to spark-ec2 in
> the future?
>
> Nick
>
>
> ------------------------------
> View this message in context: Having spark-ec2 join new slaves to
> existing cluster<http://apache-spark-user-list.1001560.n3.nabble.com/Having-spark-ec2-join-new-slaves-to-existing-cluster-tp3783.html>
> Sent from the Apache Spark User List mailing list archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at
> Nabble.com.
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Having spark-ec2 join new slaves to existing cluster

Nick Chammas

On Tue, Jun 3, 2014 at 6:52 AM, sirisha_devineni <[hidden email]> wrote:
Did you open a JIRA ticket for this feature to be implemented in spark-ec2?
If so can you please point me to the ticket?

Just created it: https://issues.apache.org/jira/browse/SPARK-2008

Nick