What to do with suddenly unreachable non-logging EC2 instance?
I have an EC2 "micro instance" running Canonical's Ubuntu 10.04 LTS. Has been running for 6-9 months now, infrequently rebooted (once every few weeks at the most).
I just did what I thought was a routine aptitude update, aptitude full-upgrade. On noticing there seemed to have been some new -ec2 linux images installed, I rebooted the system. While it seemed to reboot and go back to "running" status on the console, it didn't come back with its usual ssh and http services. I've tried stopping and starting it, re-associating it's elastic IP... no joy.
The strange thing is, "Get System Log" (AWS console) returns a completely blank log. Empty. Nothing. Not one character. (At least it's empty after the first start-stop; before the stop it just contained a final line about restarting).
I've tried a few stop-start cycles but no improvement.
Any advice what to try next to get my instance back to life ?
Solution 1:
I run into very same problem recently. I'm quite new to EC2 in general, but with some help from Eric's blog I have managed to troubleshoot and resolve the issue, although I'm still not sure what it REALLY was. I think it possibly is missing kernel AKI for this particular AMI and its new updated kernel image (BTW, Im running the same AMI)
- I stopped my instance, attached the volume to the new one (running on the same AMI). Had to play a bit with e2label and fstab.
- Mounted old filesystem (including dev and proc) and chrooted to it
- Upgraded kernel to the version one before the latest, as I couldnt find AKI corresponding with it. I had to change AKI Manually using EC2 API tools
- Removed new EBS volume (fixing first partition labels) and booted back to the old volume
Im running now 2.6.32-318-ec2
Can someone correct me if I'm wrong pointing the missing AKI as the source of problem? Anyway it worked and I'm sure Ill test all upgrades on the test host first before applying it to the production system.
Solution 2:
My solution/recovery was:
- Instantiate a fresh instance with the Ubuntu 10.04 AMI ami-c00e3cb4 (promptly updated and upgraded and rebooting to linux-image-2.6.32-319-ec2 no problem).
- re installed all the packages of importance
- Mounted a snapshot of the old non-booting instance (made after it became non-booting) as a volume.
- rsynced over the handful of /etc and /var and /home of importance
and it's back as it was before (with the advantage of being a little less crufty).
I didn't bother trying to boot a fresh instance with the problem image because... well, surely all the "state" lives in the disk image (which I can only guess suffered some boot-related corruption) so I wouldn't expect any different result.
Just "one of those things" I guess ?
In future I think I'll be snapshotting more regularly, and before any kernel updates.