Setting up Amazon Cloudwatch to get an alert when you server is down

I have an instance running on Amazon EC2 that I turned into a webserver.

Now I have been looking at cloudwatch, but I do not know if it is the correct tool for the job. Basically I want to get informed when the server is down, for whatever reason.

Maybe the server got hacked, or the server shut down for whatever reason, I want to get a notification on that.

I have enabled clouwatch, and tried to set up a alert, but I only see things like network in-out or cpu usage, an d metrix. Now I do not know if these will do the trick.


Solution 1:

One recommendation is to monitor a metric that should always have a numeric value - such as CPU usage, and trigger an alarm when the metric state is 'insufficient data' you can use Amazon's SNS to notify you of this.

Alternatively, you can setup custom metrics which return a binary state for specific services (httpd, mysql, etc) and generate an alert any time any of these reads 0. This approach offers the possibility of much finer detail - combine it with 'insufficient data' to cover all cases.

You may be more successful using something that actually monitors your site (e.g. Pingdom, UptimeRobot, etc).

Solution 2:

You can use OpsGenie (http://www.opsgenie.com) to send rich alert for CloudWatch. Currenly CloudWatch has a limited set of alerting mechanism including Email and SMS via its SNS mechanism.

You can configure CloudWatch to call OpsGenie web services API, get the right people notified rapidly via push notifications to iPhone/Android apps, SMS, voice calls, etc. according to the preferences of the recipients.

Please take a look at following blog post for detailed information:

http://www.opsgenie.com/blog/2012/09/04/aws-cloudwatch-alarms-on-your-mobile-with-opsgenie.html

Solution 3:

You can implement an EC2 status check. It's done from the EC2 dashboard. Go to instances, select your instance, choose the status checks tab (next to instance description) Click on create status check alarm The default "Status Check Failed (any)" should be good. I always set the interval to greater than one so I don't get bothered for transient issues.

It's also possible to set EC2 to automatically recover your instance if it goes down for some reason.

I also recommend a secondary monitoring system. Dumb is good for this one. I set up the linux utility mon pointed at my webserver from another host. If it fails to get a 200 response code twice in a row I get an email.

Solution 4:

You can create an Alarm in Cloudwatch and set the alarm to notify you when it goes into "Insufficient Data" state. Most of the already available metrics are from the VM Host, which doesn't have any real idea about what's happening inside your machine.

At a start, I'd recommend installing the Amazon tools in your instance and set up a script to report something, (Anything: CPU usage, whatever) and alarm if that metric stops sending data (So the metric goes into the Insufficient Data state).

This is only a bare minimum, but should be a good place to start.

See the monitoring scripts section of Cloudwatch developer guide: http://docs.amazonwebservices.com/AmazonCloudWatch/latest/DeveloperGuide/mon-scripts.html

Solution 5:

You can use Route 53 and its "Health checks". With this, you can send SNS alerts and also redirect your users to another secondary website or an error screen. I think this is better solution for your problem then Cloudwatch.