Which database servers are not interrupted by server reboots? (Clusters?)
Solution 1:
No interrupt at all during scheduled maintenance including a restart of the OS? Oracle RAC. It's the only real option I can think of, and certainly the only parallell cluster database I would trust for this. Even RAC must sometimes go down for database patches but most can be applied while running.
If you can handle at least 10-15 seconds downtime, there are a number of other options including clustering at application level (veritas cluster, microsoft cluster, oracle clusterware) or replication at the database level. A virtual infrastrucutre on it's own won't help much. The OS still has to go down.
It is also possible to combine replicated databases with a multihomed client for uninterrupted production, allthough I can't remember the name of any such clients, at the moment anyway.
I might add that you'll probably want to go with some sort of *NIX to keep them reboots to a minimum. As far as I remember there has only been one update worth rebooting for on RHEL and OEL the last couple of years.
Oracle RAC is a parallell cluster. The database is stored on shared storage and accessed by all nodes simultaniously. Done right it should improve overall performance in most cases, and yield little or no difference in query response times. This is complex technology, however, and doing it right is far from trivial.
There are a few other parallell technologies that promise five nines (99,999% uptime, equalling 5 minutes downtime per year) but they are either too old (VAX) or too new (NDB).
Solution 2:
The difference between a reliable system and one with zero downtime is the difference between putting an aluminum balloon into low earth orbit and putting a person on the moon and getting them back again safely.
I would look at the old-school ways of doing this, which in my opinion are the ones that you should be looking at if you need it to work the first time and not blow the budget.
The old standbys are OpenVMS clusters and Tandem (now HP) NonStop. Both of these are designed for several computers running exactly the same DB and same code. Both are designed to provide 100% uptime even through OS and software upgrades and patches. Both have a proven, decades long track record of working properly.
Now -- there are modern things that will provide this, on paper. In practice, you'll run into issues like "oops, we made a mistake in our license server and your VMs now won't boot." In a decade, I'm sure these technologies will be tested and proven to be reliable, but until then, if you need it to work, be very conservative in which stories you believe.
And, lastly, the most important thing in making a system this reliable is to design it well, build it well, and take care of it well because in practice, the least reliable thing in the equation is the person behind the keyboard.
Solution 3:
MySQL Cluster http://www.mysql.com/products/database/cluster/
- Shared Nothing architecture (central storage is not required).
- Rolling upgrades - update without stopping the cluster.
- You can specify how many copies of your data should exist in the cluster.
- Historically has been an in-memory database, meaning your total database could not exceed the amount of RAM in your cluster (minus overhead for replication).
- Now supports on-disk databases too.
- Doesn't have all the features of some of the other MySQL storage engines.
Solution 4:
There are a few ways to go about this. Clusters at the OS level can work, with a brief outage when you move from one node to another. You did not specify your OS platform. Most of the ?NIX platforms have a robust clustering solution.
As far as the DB platform, Oracle has their shared everything approach with RAC where you can bring down a single node and everything will get moved to the other node(s) in the cluster. It allows you to do maintenance on a node while the other nodes keep running and servicing clients. They all utilize the same set of disks. The effect on performance depends upon the hardware sizing, most places size their hardware to N+1 capacity to make sure performance is not effected while doing this type of activity.
Informix has something similar now in their latest release. DB2 is supposed to get this soon.