System downtime notification services [closed]

We send notifications to our clients when we are scheduling system maintenance, or if the system is down or running slow. We use several methods of communication (email, web site status message, twitter, blog entries, phones). The problem with this approach is that these services are either hosted by us, our requires our internet connection to be useful.

We had a significant failure last week. I don't have all the details but in a nut shell a T1 went down and the fail-over failed. Email, phones, internet, ftp were down.

I'm a programmer and I suggested that I build a utility that automates most of these notification tasks from a simple web interface. This does no good if we host it internally and we are down. We need to move the notification services off-site somewhere.

My fear in doing this is that if the system notifications start coming from another domain, people are going to be scratching their heads; some may even disregard the alerts.

Any suggestions?


Solution 1:

Can you perhaps host your notification software on another host at another location/on another Internet connection, but have it resolve to a subdomain of your current domain? For example, if you're currently monitoring at example.com, you could move your service to monitor.example.com and start sending notifications from there.

As for the utility, you might look into Nagios - it's a reasonably complete suite of monitoring tools that can watch web services, FTP, email, the works. You can host it separately and just configure all your main services to send status to the Nagios host, then configure Nagios to send alerts if it doesn't hear from the main site for a certain period of time or if the main site starts acting unexpectedly.