What are the performance implications for using SQL Server replication?

To add some more about transactional replication:

  • it uses a SQL Agent log reader job to harvest committed transactions from the transaction log of the publication database. This means that the log cannot be cleared until the log records have been read. If the log reader agent periodicity is changed then your log may grow unexpectedly. The log reader agent could also cause contention on the transaction log on high-volume OLTP systems, depending on your IO subsystem of course.
  • replication does not guarantee zero data-loss as there is latency involved in reading the log records and passing them through the distributor to the subscriber(s). For zero data-loss, look at synchronous database mirroring or synchronous SAN replication
  • peer-to-peer replication is a good way of scaling out a query workload, and can also add some redundancy to your data
  • you have to do some careful schema design with peer-to-peer to avoid collisions caused by similar changes at different nodes. Don't use partitioned identities for that. Use a composite surrogate key (e.g. node-identifier + bigint)
  • with peer-to-peer, it can be hard to add extra redundancy to the various nodes in the topology. The publisher can be mirrored, the subscriber can be mirrored (reasonably easily in 2008, not so easily in 2005), but the distributor cannot. It must be clustered to add redundancy.

Just a few thoughts. You might also checkout the whitepaper on mirroring + repl I wrote last year at http://download.microsoft.com/download/d/9/4/d948f981-926e-40fa-a026-5bfcf076d9b9/ReplicationAndDBM.docx

Edit: ok - it's lunchtime and I have some more to add:

  • peer-to-peer replication: in 2005 if you want to change the topology at all (add or remove nodes) you have to quiesce the whole topology. In 2008 you don't need to.
  • peer-to-peer replication doesn't have conflict detection until 2008, but even then it's conflict resolution is pretty brain-dead - node with the highest ID (called the peer originator ID) wins - maybe not what you want.
  • peer-to-peer replication: all nodes see all changes from the other nodes. This means that in a 3-way topology with, say, Seattle, London, Tokyo - if Seattle goes down, London and Tokyo keep going. If Tokyo then goes down and Seattle comes online, it will get all the London updates from London AND all the Tokyo updates that London knows about, from London. Pretty neat.
  • there is no form of failure detection or automatic failover with replication. Maybe look at mirroring instead. I guess you could use some form of NLB.

When choosing any kind of HA solution (good timing as I'm teaching an HA class for internal Microsoft DBAs today), you need to start with requirements analysis before evaluating technologies. It's a bit hard to give recommendations without knowing all your requirements.

I blogged about questions to ask yourself when coming up with an HA strategy: see http://www.sqlskills.com/BLOGS/PAUL/post/HA-Where-do-you-start-when-choosing-a-high-availability-solution.aspx

Edit again:

  • An example of it's use: various servers in data tier with mid-tier load balancing. Peer-to-peer allows all nodes to keep (eventually) in-sync.
  • Nasty problem though: if a user is routed to node 1 and does a transaction, how long before the data is replicated to the other nodes, as repl latency can vary? If the user connects to the service again, which node to route her to? Same node as before or has enough time passed to be able to safely route to any node and guarantee that the previous transaction she did has been replicated to all nodes?

ok - no more edits! :-)


Replication is quite a diverse technology and can be used to cater for a number of different scenarios, the choice of which will determine the specific type of replication that is implemented.

For example, merge replication can be used to support distributed processing by spreading the workload of an application across several servers, i.e. distributed processing architectures.

Merge replication often requires an application that is relatively aware of its environment. Techniques such as conflict resolution also have to be taken into consideration in order to ensure data consistency across the entire integrated environment.

Transactional Replication can be used in a similar fashion to log shipping however you can limit the specific objects that are replicated to the subscriber. This can be useful if only a subset of tables is required for reporting purposes.

For a full listing of the available architectures see the following Microsoft Replication reference.

http://msdn.microsoft.com/en-us/library/ms151827.aspx

The flavour of replication you use will determine the type of issues you may encounter and need to consider. For example, Merge Replication requires that Schema changes be made to your database.

There are also security considerations to be taken into account, for example if you data is to be replicated over the public internet or whether or will need to encrypt communications etc.

Replication is a big topic but I hope that this information sets you off in the right direction.