Why do big Exchange installations require downtime for maintenance?

I have used several top tier third party Exchange providers over the years, and all of them have had regular scheduled downtimes for routine maintenance (roughly once a month). I am wondering what is it about Exchange that makes it impossible to avoid downtime of this nature? Is it truly impossible to maintain 24x7 uptime or just prohibitively expensive and why?


Solution 1:

Windows and Exchange updates always require reboots to finish installing, so that's part of it. With good planning and proper setup (load balancers and clustering) you can maintain 24x7 uptime so that email is always available on one server.

On the client side though, there's always a brief outage (5-30sec) as Outlook figures out the CAS server it's connected to is offline and Autodiscover switches it to another. Usually you get the "The Exchange Administrator has made a change that requires you to restart Outlook" message when this happens.

It's not a long outage, but it counts as an outage, so that's why you need to schedule maintenance time to do it. There's also the chance of something going wrong during the maintenance period, so to CYA you need to schedule it.

EDIT: So I found out if you put a load balancer in front of your CAS servers you won't get the "The Exchange Administrator has made a change..." message in Outlook. You'll still have a brief outage as the load balancer switches you to an active CAS server.

Solution 2:

Long story short...You're not using that great of providers.

There's no reason why you need to schedule regular downtime of an Exchange environment (although scheduling regular maintenance windows is always a wise thing). Especially with Exchange 2010. As long as your redundancy is planned and implemented properly everything just flows. Redundant network, redundant storage, redundant servers.

You're probably not going to get this using a $3/month provider. I don't resell $3/month Exchange mailboxes. Most of my reasons for recommending hosted Exchange deal with the importance of email and uptime. You have to pay more for a provider that doesn't go down all of the time, but the ROI makes it make sense.

Our provider sends out emails for every scheduled maintenance. We get at least one of them a month. 99% of them basically read, "We're doing maintenance on our servers between 2AM and 2:15AM. You may notice 15-30 seconds of connectivity issues while mailboxes/storage fails over."

If you're looking for 100% uptime without 30 seconds of failover, you're just not going to find that anywhere. Not with Exchange, not with Google Apps, not with Domino. 100% uptime does not exist. Maintenance windows will always be needed and failovers still require time (even if that time is brief).

So find a new provider that may cost more but will provide you the kind of uptime you need.

Solution 3:

Keep N+1 redundancy for literally every part of the network to ensure no downtime for maintenance would cost more; enough more than the service would no longer be price competitive. Most businesses are very tolerant of minimal scheduled downtime. This isn't exclusive to Exchange, almost every hosting vendor, of any type, I've dealt with does approximately the same thing.

In the case of Exchange it's going to be coming down once a month (at least) for Patch Tuesday.

Solution 4:

The only routine maintenance that Exchange needs is patching the servers, which in a properly built environment should create no downtime at all: even with only two servers you can keep one of them active while you patch & reboot the other. Taking backups doesn't create any downtime, and testing restores is something that needs to be done in a DR environment, not on live servers. I really don't understand what this "routine maintenance" is or why should it be needed at all.

I'm not saying of course that ensuring 99% uptime is easy; troubles happen, even in the best and most expensive environments. What I am saying is, there is no reason at all to take down an Exchange system for "routine maintenance". Unless you perform routine offline defragmentations of your databases... something that nobody in his right mind should be doing anymore (but still...).