SQL Server 2008 failover strategy - Log shipping or replication?

Solution 1:

For all those suggesting mirroring - it's not going to be possible to mirror 400 databases. You'll run out of worker threads way before you hit 400, whether on 32- or 64-bit. There are at least 2 worker threads per database on the principal and at least 3 per database on the mirror.

It's really got nothing to do with the skills of the DBA. It depends entirely on what the HA requirements of the business or application are - these are the driving factors, not what the DBA can cope with. If a more complex solution is required, then getting a DBA that an deal with it is also required. Limiting your HA solution because of the skills of the DBA is nonsensical.

Database mirroring is also not fast-detection and fast-failover - it entirely depends on what the failure is, what the mirroring partner timeout is, what the SEND and REDO queues are. It could be quite a while for a failover to complete one the mirror decides to do become the principal. I've helped many customers implement mirroring (both when at Microsoft when I owned database mirroring along with the rest of the Storage Engine), and since leaving in 2007.

Forget mirroring for that number of databases.

There are some fundamental questions you need to answer before we can give you a recommendation:

  • Do you want to have a redundant copy of the entire databases or just portions?
  • Do you want to be able to use the redundant copies? For writes or just for reads?
  • What's the transaction log generation rate of the databases?
  • What's the network bandwidth between the two sites?

Log shipping is probably the easiest for you, both in terms of setting it up, monitoring it, and being able to bring the redundant copies only quickly and within your data-loss limits. With log shipping you can choose to bring the redundant copies online without finishing restoring all the transaction log backups queued for restore, by accepting data-loss.

You can also effect simple failovers using a mid-tier routing technology, or even something as simple as Windows NLB with a 0/100 config, or switching to 100/0. I've also seen customers using DNS switching to effect failovers when the server name cannot change.

With replication you have to deal with unpredictable latency between the transactions being harvested from the publication db transaction log, hitting the distribution database and then being pushed/pulled to the subscription database by the distribution agent. Replication can also get itself twisted up and be a pain to figure out and reset - log shipping is dead simple.

Given that you've been using your own log shipping, I'm guessing that the tran log generation rate and network bandwidth aren't an issue - I'd stick with log shipping and shy away from replication. Don't even think about database mirroring as it won't work for your volume.

Hope this helps - this is really two day's worth of discussion when I teach HA to the DBAs inside Microsoft - boiled down to 5 mins answering this. Feel free to follow-up with more specific questions in comments.

Solution 2:

We use Mirroring for up-to-the-second on-site recovery.

For the over the WAN off-site backups (e.g. high latency) we use the built in log shipping.

Note that for EITHER option you have to set this up PER DATABASE which, for 400 DBs, will be a giant pain in the rear.

Do you use a storage array (SAN or NAS)? The easiest method seems to be block level replication via your storage array snaps (with the appropriate SQL Server quiesce plugin) -- that'll capture EVERY database automagically.