Spark GCE Script

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark GCE Script

Akhil
Hi Sparkers,

We have created a quick spark_gce script which can launch a spark cluster in the Google Cloud. I'm sharing it because it might be helpful for someone using the Google Cloud for deployment rather than AWS.

Here's the link to the script


Feel free to use it and suggest any feedback around it.

In short here's what it does:

Just like the spark_ec2 script, this one also reads certain command-line arguments (See the github page for more details) like the cluster name and all, then starts the machines in the google cloud, sets up the network, adds a 500GB empty disk to all machines, generate the ssh keys on master and transfer it to all slaves and install java and downloads and configures Spark/Shark/Hadoop. Also it starts the shark server automatically. Currently the version is 0.9.1 but I'm happy to add/support more versions if anyone is interested.


Cheers.


Thanks
Best Regards
Reply | Threaded
Open this post in threaded view
|

Re: Spark GCE Script

Matei Zaharia
Administrator
Very cool! Have you thought about sending this as a pull request? We’d be happy to maintain it inside Spark, though it might be interesting to find a single Python package that can manage clusters across both EC2 and GCE.

Matei

On May 5, 2014, at 7:18 AM, Akhil Das <[hidden email]> wrote:

Hi Sparkers,

We have created a quick spark_gce script which can launch a spark cluster in the Google Cloud. I'm sharing it because it might be helpful for someone using the Google Cloud for deployment rather than AWS.

Here's the link to the script


Feel free to use it and suggest any feedback around it.

In short here's what it does:

Just like the spark_ec2 script, this one also reads certain command-line arguments (See the github page for more details) like the cluster name and all, then starts the machines in the google cloud, sets up the network, adds a 500GB empty disk to all machines, generate the ssh keys on master and transfer it to all slaves and install java and downloads and configures Spark/Shark/Hadoop. Also it starts the shark server automatically. Currently the version is 0.9.1 but I'm happy to add/support more versions if anyone is interested.


Cheers.


Thanks
Best Regards

Reply | Threaded
Open this post in threaded view
|

Re: Spark GCE Script

Nick Chammas
I second this motion. :)

A unified "cloud deployment" tool would be absolutely great.


On Mon, May 5, 2014 at 1:34 PM, Matei Zaharia <[hidden email]> wrote:
Very cool! Have you thought about sending this as a pull request? We’d be happy to maintain it inside Spark, though it might be interesting to find a single Python package that can manage clusters across both EC2 and GCE.

Matei

On May 5, 2014, at 7:18 AM, Akhil Das <[hidden email]> wrote:

Hi Sparkers,

We have created a quick spark_gce script which can launch a spark cluster in the Google Cloud. I'm sharing it because it might be helpful for someone using the Google Cloud for deployment rather than AWS.

Here's the link to the script


Feel free to use it and suggest any feedback around it.

In short here's what it does:

Just like the spark_ec2 script, this one also reads certain command-line arguments (See the github page for more details) like the cluster name and all, then starts the machines in the google cloud, sets up the network, adds a 500GB empty disk to all machines, generate the ssh keys on master and transfer it to all slaves and install java and downloads and configures Spark/Shark/Hadoop. Also it starts the shark server automatically. Currently the version is 0.9.1 but I'm happy to add/support more versions if anyone is interested.


Cheers.


Thanks
Best Regards


Reply | Threaded
Open this post in threaded view
|

Re: Spark GCE Script

François Le Lay
Has anyone considered using jclouds tooling to support multiple cloud providers? Maybe using Pallet?

François

On May 5, 2014, at 3:22 PM, Nicholas Chammas <[hidden email]> wrote:

I second this motion. :)

A unified "cloud deployment" tool would be absolutely great.


On Mon, May 5, 2014 at 1:34 PM, Matei Zaharia <[hidden email]> wrote:
Very cool! Have you thought about sending this as a pull request? We’d be happy to maintain it inside Spark, though it might be interesting to find a single Python package that can manage clusters across both EC2 and GCE.

Matei

On May 5, 2014, at 7:18 AM, Akhil Das <[hidden email]> wrote:

Hi Sparkers,

We have created a quick spark_gce script which can launch a spark cluster in the Google Cloud. I'm sharing it because it might be helpful for someone using the Google Cloud for deployment rather than AWS.

Here's the link to the script


Feel free to use it and suggest any feedback around it.

In short here's what it does:

Just like the spark_ec2 script, this one also reads certain command-line arguments (See the github page for more details) like the cluster name and all, then starts the machines in the google cloud, sets up the network, adds a 500GB empty disk to all machines, generate the ssh keys on master and transfer it to all slaves and install java and downloads and configures Spark/Shark/Hadoop. Also it starts the shark server automatically. Currently the version is 0.9.1 but I'm happy to add/support more versions if anyone is interested.


Cheers.


Thanks
Best Regards


Reply | Threaded
Open this post in threaded view
|

Re: Spark GCE Script

Akhil
Hi Matei,

Will clean up the code a little bit and send the pull request :)

Thanks
Best Regards


On Tue, May 6, 2014 at 1:00 AM, François Le lay <[hidden email]> wrote:
Has anyone considered using jclouds tooling to support multiple cloud providers? Maybe using Pallet?

François

On May 5, 2014, at 3:22 PM, Nicholas Chammas <[hidden email]> wrote:

I second this motion. :)

A unified "cloud deployment" tool would be absolutely great.


On Mon, May 5, 2014 at 1:34 PM, Matei Zaharia <[hidden email]> wrote:
Very cool! Have you thought about sending this as a pull request? We’d be happy to maintain it inside Spark, though it might be interesting to find a single Python package that can manage clusters across both EC2 and GCE.

Matei

On May 5, 2014, at 7:18 AM, Akhil Das <[hidden email]> wrote:

Hi Sparkers,

We have created a quick spark_gce script which can launch a spark cluster in the Google Cloud. I'm sharing it because it might be helpful for someone using the Google Cloud for deployment rather than AWS.

Here's the link to the script


Feel free to use it and suggest any feedback around it.

In short here's what it does:

Just like the spark_ec2 script, this one also reads certain command-line arguments (See the github page for more details) like the cluster name and all, then starts the machines in the google cloud, sets up the network, adds a 500GB empty disk to all machines, generate the ssh keys on master and transfer it to all slaves and install java and downloads and configures Spark/Shark/Hadoop. Also it starts the shark server automatically. Currently the version is 0.9.1 but I'm happy to add/support more versions if anyone is interested.


Cheers.


Thanks
Best Regards



Reply | Threaded
Open this post in threaded view
|

Re: Spark GCE Script

Aureliano Buendia
Please send a pull request, this should be maintained by the community, just in case you do not feel like continuing to maintain it.

Also, nice to see that the gce version is shorter than the aws version.


On Tue, May 6, 2014 at 10:11 AM, Akhil Das <[hidden email]> wrote:
Hi Matei,

Will clean up the code a little bit and send the pull request :)

Thanks
Best Regards


On Tue, May 6, 2014 at 1:00 AM, François Le lay <[hidden email]> wrote:
Has anyone considered using jclouds tooling to support multiple cloud providers? Maybe using Pallet?

François

On May 5, 2014, at 3:22 PM, Nicholas Chammas <[hidden email]> wrote:

I second this motion. :)

A unified "cloud deployment" tool would be absolutely great.


On Mon, May 5, 2014 at 1:34 PM, Matei Zaharia <[hidden email]> wrote:
Very cool! Have you thought about sending this as a pull request? We’d be happy to maintain it inside Spark, though it might be interesting to find a single Python package that can manage clusters across both EC2 and GCE.

Matei

On May 5, 2014, at 7:18 AM, Akhil Das <[hidden email]> wrote:

Hi Sparkers,

We have created a quick spark_gce script which can launch a spark cluster in the Google Cloud. I'm sharing it because it might be helpful for someone using the Google Cloud for deployment rather than AWS.

Here's the link to the script


Feel free to use it and suggest any feedback around it.

In short here's what it does:

Just like the spark_ec2 script, this one also reads certain command-line arguments (See the github page for more details) like the cluster name and all, then starts the machines in the google cloud, sets up the network, adds a 500GB empty disk to all machines, generate the ssh keys on master and transfer it to all slaves and install java and downloads and configures Spark/Shark/Hadoop. Also it starts the shark server automatically. Currently the version is 0.9.1 but I'm happy to add/support more versions if anyone is interested.


Cheers.


Thanks
Best Regards




Reply | Threaded
Open this post in threaded view
|

Re: Spark GCE Script

Akhil
Hi

I have sent a pull request https://github.com/apache/spark/pull/681 you can verify it and add it :)


Thanks
Best Regards


On Thu, May 8, 2014 at 2:58 AM, Aureliano Buendia <[hidden email]> wrote:
Please send a pull request, this should be maintained by the community, just in case you do not feel like continuing to maintain it.

Also, nice to see that the gce version is shorter than the aws version.


On Tue, May 6, 2014 at 10:11 AM, Akhil Das <[hidden email]> wrote:
Hi Matei,

Will clean up the code a little bit and send the pull request :)

Thanks
Best Regards


On Tue, May 6, 2014 at 1:00 AM, François Le lay <[hidden email]> wrote:
Has anyone considered using jclouds tooling to support multiple cloud providers? Maybe using Pallet?

François

On May 5, 2014, at 3:22 PM, Nicholas Chammas <[hidden email]> wrote:

I second this motion. :)

A unified "cloud deployment" tool would be absolutely great.


On Mon, May 5, 2014 at 1:34 PM, Matei Zaharia <[hidden email]> wrote:
Very cool! Have you thought about sending this as a pull request? We’d be happy to maintain it inside Spark, though it might be interesting to find a single Python package that can manage clusters across both EC2 and GCE.

Matei

On May 5, 2014, at 7:18 AM, Akhil Das <[hidden email]> wrote:

Hi Sparkers,

We have created a quick spark_gce script which can launch a spark cluster in the Google Cloud. I'm sharing it because it might be helpful for someone using the Google Cloud for deployment rather than AWS.

Here's the link to the script


Feel free to use it and suggest any feedback around it.

In short here's what it does:

Just like the spark_ec2 script, this one also reads certain command-line arguments (See the github page for more details) like the cluster name and all, then starts the machines in the google cloud, sets up the network, adds a 500GB empty disk to all machines, generate the ssh keys on master and transfer it to all slaves and install java and downloads and configures Spark/Shark/Hadoop. Also it starts the shark server automatically. Currently the version is 0.9.1 but I'm happy to add/support more versions if anyone is interested.


Cheers.


Thanks
Best Regards





Reply | Threaded
Open this post in threaded view
|

Re: Spark GCE Script

Aureliano Buendia



On Fri, May 16, 2014 at 11:19 AM, Akhil Das <[hidden email]> wrote:
Hi

I have sent a pull request https://github.com/apache/spark/pull/681 you can verify it and add it :)

Matei,

Would you please verify this pull request for Jenkins? It has been a couple of weeks.
 


Thanks
Best Regards


On Thu, May 8, 2014 at 2:58 AM, Aureliano Buendia <[hidden email]> wrote:
Please send a pull request, this should be maintained by the community, just in case you do not feel like continuing to maintain it.

Also, nice to see that the gce version is shorter than the aws version.


On Tue, May 6, 2014 at 10:11 AM, Akhil Das <[hidden email]> wrote:
Hi Matei,

Will clean up the code a little bit and send the pull request :)

Thanks
Best Regards


On Tue, May 6, 2014 at 1:00 AM, François Le lay <[hidden email]> wrote:
Has anyone considered using jclouds tooling to support multiple cloud providers? Maybe using Pallet?

François

On May 5, 2014, at 3:22 PM, Nicholas Chammas <[hidden email]> wrote:

I second this motion. :)

A unified "cloud deployment" tool would be absolutely great.


On Mon, May 5, 2014 at 1:34 PM, Matei Zaharia <[hidden email]> wrote:
Very cool! Have you thought about sending this as a pull request? We’d be happy to maintain it inside Spark, though it might be interesting to find a single Python package that can manage clusters across both EC2 and GCE.

Matei

On May 5, 2014, at 7:18 AM, Akhil Das <[hidden email]> wrote:

Hi Sparkers,

We have created a quick spark_gce script which can launch a spark cluster in the Google Cloud. I'm sharing it because it might be helpful for someone using the Google Cloud for deployment rather than AWS.

Here's the link to the script


Feel free to use it and suggest any feedback around it.

In short here's what it does:

Just like the spark_ec2 script, this one also reads certain command-line arguments (See the github page for more details) like the cluster name and all, then starts the machines in the google cloud, sets up the network, adds a 500GB empty disk to all machines, generate the ssh keys on master and transfer it to all slaves and install java and downloads and configures Spark/Shark/Hadoop. Also it starts the shark server automatically. Currently the version is 0.9.1 but I'm happy to add/support more versions if anyone is interested.


Cheers.


Thanks
Best Regards