Spark-ec2 asks for password

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark-ec2 asks for password

Aureliano Buendia
Hi,

Since 0.9.0 spark-ec2 has gone unstable. During launch it throws many errors like:

ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22: Connection refused
Error 255 while executing remote command, retrying after 30 seconds

.. and recently, it prompts for passwords!:

Warning: Permanently added '' (RSA) to the list of known hosts.
Password:

Note that the hostname in Permanently added '' is missing in the log, which is probably why it asks for a password.

Is this a known bug?
Reply | Threaded
Open this post in threaded view
|

Re: Spark-ec2 asks for password

Frank Austin Nothaft
Aureliano,

I've been noticing this error recently as well:

ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22: Connection refused
Error 255 while executing remote command, retrying after 30 seconds


However, this isn't an issue with the spark-ec2 scripts. After the scripts fail, if you wait a bit longer (e.g., another 2 minutes), the EC2 hosts will finish launching and port 22 will open up. Until the EC2 host has launched and opened port 22 for SSH, SSH cannot succeed, and the Spark-ec2 scripts will fail. I've noticed that EC2 machine launch latency seems to be highest in Oregon; I haven't run into this problem on either the California or Virgina EC2 farms. To work around this issue, I've manually modified my copy of the EC2 scripts to wait for 6 failures (i.e., 3 minutes), which seems to work OK. Might be worth a try on your end. I can't comment about the password request; I haven't seen that on my end.

Regards,

Frank Austin Nothaft
202-340-0466


On Fri, Apr 18, 2014 at 8:57 PM, Aureliano Buendia <[hidden email]> wrote:
Hi,

Since 0.9.0 spark-ec2 has gone unstable. During launch it throws many errors like:

ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22: Connection refused
Error 255 while executing remote command, retrying after 30 seconds

.. and recently, it prompts for passwords!:

Warning: Permanently added '' (RSA) to the list of known hosts.
Password:

Note that the hostname in Permanently added '' is missing in the log, which is probably why it asks for a password.

Is this a known bug?

Reply | Threaded
Open this post in threaded view
|

Re: Spark-ec2 asks for password

Patrick Wendell
Unfortunately - I think a lot of this is due to generally increased latency on ec2 itself. I've noticed that it's way more common than it used to be for instances to come online past the "wait" timeout in the ec2 script.


On Fri, Apr 18, 2014 at 9:11 PM, FRANK AUSTIN NOTHAFT <[hidden email]> wrote:
Aureliano,

I've been noticing this error recently as well:

ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22: Connection refused
Error 255 while executing remote command, retrying after 30 seconds


However, this isn't an issue with the spark-ec2 scripts. After the scripts fail, if you wait a bit longer (e.g., another 2 minutes), the EC2 hosts will finish launching and port 22 will open up. Until the EC2 host has launched and opened port 22 for SSH, SSH cannot succeed, and the Spark-ec2 scripts will fail. I've noticed that EC2 machine launch latency seems to be highest in Oregon; I haven't run into this problem on either the California or Virgina EC2 farms. To work around this issue, I've manually modified my copy of the EC2 scripts to wait for 6 failures (i.e., 3 minutes), which seems to work OK. Might be worth a try on your end. I can't comment about the password request; I haven't seen that on my end.

Regards,

Frank Austin Nothaft
<a href="tel:202-340-0466" value="+12023400466" target="_blank">202-340-0466


On Fri, Apr 18, 2014 at 8:57 PM, Aureliano Buendia <[hidden email]> wrote:
Hi,

Since 0.9.0 spark-ec2 has gone unstable. During launch it throws many errors like:

ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22: Connection refused
Error 255 while executing remote command, retrying after 30 seconds

.. and recently, it prompts for passwords!:

Warning: Permanently added '' (RSA) to the list of known hosts.
Password:

Note that the hostname in Permanently added '' is missing in the log, which is probably why it asks for a password.

Is this a known bug?


Reply | Threaded
Open this post in threaded view
|

Re: Spark-ec2 asks for password

Aureliano Buendia
In reply to this post by Frank Austin Nothaft
Frank,

Thanks for the prompt reply. Unfortunately I've been experiencing this for the past few weeks on N Virginia farm, note that the latency might also depend on the instance type.

I'll try to amend the ec2 script as you suggested, but that will mean waiting even longer for the cluster to come up. The current waiting time cannot be classified as short (above 15 mins for 50 instances).

I have tried this with and without spot pricing, and there was no difference. It seems like amazon is not catching up fast enough with the clustering demands.

I wish spark would officially support google compute engine as well, specially with the recent price drop, and given that gce is known to start up much faster [1].





On Sat, Apr 19, 2014 at 5:11 AM, FRANK AUSTIN NOTHAFT <[hidden email]> wrote:
Aureliano,

I've been noticing this error recently as well:

ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22: Connection refused
Error 255 while executing remote command, retrying after 30 seconds


However, this isn't an issue with the spark-ec2 scripts. After the scripts fail, if you wait a bit longer (e.g., another 2 minutes), the EC2 hosts will finish launching and port 22 will open up. Until the EC2 host has launched and opened port 22 for SSH, SSH cannot succeed, and the Spark-ec2 scripts will fail. I've noticed that EC2 machine launch latency seems to be highest in Oregon; I haven't run into this problem on either the California or Virgina EC2 farms. To work around this issue, I've manually modified my copy of the EC2 scripts to wait for 6 failures (i.e., 3 minutes), which seems to work OK. Might be worth a try on your end. I can't comment about the password request; I haven't seen that on my end.

Regards,

Frank Austin Nothaft
<a href="tel:202-340-0466" value="+12023400466" target="_blank">202-340-0466


On Fri, Apr 18, 2014 at 8:57 PM, Aureliano Buendia <[hidden email]> wrote:
Hi,

Since 0.9.0 spark-ec2 has gone unstable. During launch it throws many errors like:

ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22: Connection refused
Error 255 while executing remote command, retrying after 30 seconds

.. and recently, it prompts for passwords!:

Warning: Permanently added '' (RSA) to the list of known hosts.
Password:

Note that the hostname in Permanently added '' is missing in the log, which is probably why it asks for a password.

Is this a known bug?


Reply | Threaded
Open this post in threaded view
|

Re: Spark-ec2 asks for password

Mayur Rustagi
Hi 
We have a deployment tool from GCE that we use internally for Spark. Let me know if you want access to that. Not really clean enough to opensource though :).
Regards
Mayur


Mayur Rustagi
Ph: +1 (760) 203 3257


On Sat, Apr 19, 2014 at 10:24 AM, Aureliano Buendia <[hidden email]> wrote:
Frank,

Thanks for the prompt reply. Unfortunately I've been experiencing this for the past few weeks on N Virginia farm, note that the latency might also depend on the instance type.

I'll try to amend the ec2 script as you suggested, but that will mean waiting even longer for the cluster to come up. The current waiting time cannot be classified as short (above 15 mins for 50 instances).

I have tried this with and without spot pricing, and there was no difference. It seems like amazon is not catching up fast enough with the clustering demands.

I wish spark would officially support google compute engine as well, specially with the recent price drop, and given that gce is known to start up much faster [1].





On Sat, Apr 19, 2014 at 5:11 AM, FRANK AUSTIN NOTHAFT <[hidden email]> wrote:
Aureliano,

I've been noticing this error recently as well:

ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22: Connection refused
Error 255 while executing remote command, retrying after 30 seconds


However, this isn't an issue with the spark-ec2 scripts. After the scripts fail, if you wait a bit longer (e.g., another 2 minutes), the EC2 hosts will finish launching and port 22 will open up. Until the EC2 host has launched and opened port 22 for SSH, SSH cannot succeed, and the Spark-ec2 scripts will fail. I've noticed that EC2 machine launch latency seems to be highest in Oregon; I haven't run into this problem on either the California or Virgina EC2 farms. To work around this issue, I've manually modified my copy of the EC2 scripts to wait for 6 failures (i.e., 3 minutes), which seems to work OK. Might be worth a try on your end. I can't comment about the password request; I haven't seen that on my end.

Regards,

Frank Austin Nothaft
<a href="tel:202-340-0466" value="+12023400466" target="_blank">202-340-0466


On Fri, Apr 18, 2014 at 8:57 PM, Aureliano Buendia <[hidden email]> wrote:
Hi,

Since 0.9.0 spark-ec2 has gone unstable. During launch it throws many errors like:

ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22: Connection refused
Error 255 while executing remote command, retrying after 30 seconds

.. and recently, it prompts for passwords!:

Warning: Permanently added '' (RSA) to the list of known hosts.
Password:

Note that the hostname in Permanently added '' is missing in the log, which is probably why it asks for a password.

Is this a known bug?



Reply | Threaded
Open this post in threaded view
|

Re: Spark-ec2 asks for password

Pierre B
In reply to this post by Patrick Wendell
We’ve been experiencing this as well, and our simple solution is to actually keep trying the ssh connection instead of just waiting:

Something like this:


def wait_for_ssh_connection(opts, host):
  u.message("Waiting for ssh connection to host {}".format(host))
  connected = False
  while (connected==False):
    try:
      if (subprocess.check_call(s.ssh_command(opts) + ['-t', '-t', '%s@%s' % (opts.user, host), "ls"])==0):
        connected = True
    except subprocess.CalledProcessError as e:
      print "Ssh connection to host {} failed, retrying in 10 seconds...".format(host)
      time.sleep(10)
  print "Ssh connection to host {} successfully established!".format(host)


HTH

Pierre Borckmans

RealImpact Analytics Brussels Office

FR +32 485 91 87 31 Skype pierre.borckmans





On 19 Apr 2014, at 06:51, Patrick Wendell <[hidden email]> wrote:

Unfortunately - I think a lot of this is due to generally increased latency on ec2 itself. I've noticed that it's way more common than it used to be for instances to come online past the "wait" timeout in the ec2 script.


On Fri, Apr 18, 2014 at 9:11 PM, FRANK AUSTIN NOTHAFT <[hidden email]> wrote:
Aureliano,

I've been noticing this error recently as well:

ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22: Connection refused
Error 255 while executing remote command, retrying after 30 seconds


However, this isn't an issue with the spark-ec2 scripts. After the scripts fail, if you wait a bit longer (e.g., another 2 minutes), the EC2 hosts will finish launching and port 22 will open up. Until the EC2 host has launched and opened port 22 for SSH, SSH cannot succeed, and the Spark-ec2 scripts will fail. I've noticed that EC2 machine launch latency seems to be highest in Oregon; I haven't run into this problem on either the California or Virgina EC2 farms. To work around this issue, I've manually modified my copy of the EC2 scripts to wait for 6 failures (i.e., 3 minutes), which seems to work OK. Might be worth a try on your end. I can't comment about the password request; I haven't seen that on my end.

Regards,

Frank Austin Nothaft
<a href="tel:202-340-0466" value="+12023400466" target="_blank">202-340-0466


On Fri, Apr 18, 2014 at 8:57 PM, Aureliano Buendia <[hidden email]> wrote:
Hi,

Since 0.9.0 spark-ec2 has gone unstable. During launch it throws many errors like:

ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22: Connection refused
Error 255 while executing remote command, retrying after 30 seconds

.. and recently, it prompts for passwords!:

Warning: Permanently added '' (RSA) to the list of known hosts.
Password:

Note that the hostname in Permanently added '' is missing in the log, which is probably why it asks for a password.

Is this a known bug?