Amazon EC2 - No SSH After Reboot, Connection Refused

I've replicated this two or three times, so I'm guessing there's something wrong with what I'm doing.

Here are my steps:

  1. Launch new instance via EC2 Management console using: Ubuntu Server 13.10 - ami-ace67f9c (64-bit)
  2. Launch with defaults (using my existing key pair)
  3. The instance starts. I can SSH to it using Putty or the Mac terminal. Success!
  4. I reboot the instance
  5. 10 minutes later, when the instance should be back up and running, my terminal connection shows:

    stead:~ stead$ ssh -v -i Dropbox/SteadCloud3.pem [email protected]
    OpenSSH_5.6p1, Op`enSSL 0.9.8y 5 Feb 2013
    debug1: Reading configuration data /etc/ssh_config
    debug1: Applying options for *
    debug1: Connecting to 54.201.200.208 [54.201.200.208] port 22.
    debug1: connect to address 54.201.200.208 port 22: Connection refused
    ssh: connect to host 54.201.200.208 port 22: Connection refused
    stead:~ stead$
    

Fine, I understand that the public IP address can change, so checking the EC2 management console, I verify that it is the same. Weird. Just for fun, I try connecting with public DNS hostname: ec2-54-201-200-208.us-west-2.compute.amazonaws.com. No dice, same result.

Even using the Connect via Java SSH client built into the EC2 console, I get Connection Refused.

I checked the security groups. This instance is in group launch-wizard-4. Looking at the inbound configuration for this group, Port 22 is allowed in from 0.0.0.0/0, so that should be anywhere. I know that I'm hitting my instance and this is the right security group, because I can't ping the instance. If I enable ICMP for this security group, all of a sudden my pings go through.

I've found a few other posts around the internet with similar error messages, but most seem to be easily resolved by tweaking the firewall settings. I've tried a few of these, with no luck.

I'm guessing there's a simple EC2 step I'm missing. Thanks for any help you can give, and I'm happy to provide more information or test further!

Update - Here are my system logs from the Amazon EC2 console: http://pastebin.com/4M5pwGRt


Solution 1:

Had a similar behavior today on my ec2 instance, and tracked down the thing to this: when I do sudo reboot now the machine hangs and I have to restart it manually from the aws management console when I do sudo reboot it reboots just fine. Apparently "now" is not a valid option for reboot as pointed out here https://askubuntu.com/questions/397502/reboot-a-server-from-command-line

thoughts?

Solution 2:

From the AWS Developer Forum post on this topic:

Try stopping the broken instance, detaching the EBS volume, and attaching it as a secondary volume to another instance. Once you've mounted the broken volume somewhere on the other instance, check the /etc/sshd_config file (near the bottom). I had a few RHEL instances where Yum scrogged the sshd_config inserting duplicate lines at the bottom that caused sshd to fail on startup because of syntax errors.

Once you've fixed it, just unmount the volume, detach, reattach to your other instance and fire it back up again.

Let's break this down, with links to the AWS documentation:

  1. Stop the broken instance and detach the EBS (root) volume by going into the EC2 Management Console, clicking on "Elastic Block Store" > "Volumes", the right-clicking on the volume associated with the instance you stopped.
  2. Start a new instance in the same region and of the same OS as the broken instance then attach the original EBS root volume as a secondary volume to your new instance. The commands in step 4 below assume you mount the volume to a folder called "data".
  3. Once you've mounted the broken volume somewhere on the other instance,
  4. check the "/etc/sshd_config" file for the duplicate entries by issuing these commands:
    • cd /etc/ssh
    • sudo nano sshd_config
    • ctrl-v a bunch of times to get to the bottom of the file
    • ctrl-k all the lines at the bottom mentioning "PermitRootLogin without-password" and "UseDNS no"
    • ctrl-x and Y to save and exit the edited file
  5. @Telegard points out (in his comment) that we've only fixed the symptom. We can fix the cause by commenting out the 3 related lines in the "/etc/rc.local" file. So:
    • cd /etc
    • sudo nano rc.local
    • look for the "PermitRootLogin..." lines and delete them
    • ctrl-x and Y to save and exit the edited file
  6. Once you've fixed it, just unmount the volume,
  7. detach by going into the EC2 Management Console, clicking on "Elastic Block Store" > "Volumes", the right-clicking on the volume associated with the instance you stopped,
  8. reattach to your other instance and
  9. fire it back up again.