Windows DFSR - Changed replicated directory permissions and now have a 350,000 backlog for more than a week

Solution 1:

Very strange problem, especially after reviewing the edit.

I would inspect the DFSR debug log, which is located here: %systemroot%\debug By default there should be 9 previous log files that have been GZ archived, and one that is currently being written to.

Open that up in a text file, and do a search for the text "warning" or "error". You can check out this blog series for more detailed information on the debug logs: http://blogs.technet.com/b/askds/archive/2009/03/23/understanding-dfsr-debug-logging-part-1-logging-levels-log-format-guid-s.aspx

Solution 2:

You can tweak the replication schedule to allow DFS-R to replicate at full-speed during off hours (or even on hours if appropriate).

You can also try to increase the staging size on the back logged server. It should increase performance in this situation.

You don't mention whether or not it's capped, but I assume it is since you have replication across a WAN.

Solution 3:

My experience is that this is Just How It Works.

I stumbled across this after updating security on a fairly small collection of 4 DFS replication groups (550 GB data, 58k files, 3.4k folders total). Data actually transmitted on the wire is low so it appears not to be moving entire files for just security changes, but disk activity feels like the entire hierarchy is being recopied -- sustained disk transfer rates between 60-100 MB/sec, and disk queues of 30, peaking as high as 500 on SSD tiered storage space.

My sense is that DFS has a lot of churn in its staging and destaging process which results in extreme disk I/O. An initial replication process between two gigabit LAN connected boxes takes multiples of time longer than the same data simply file copied between boxes, which would seem to indicate every byte replicated requires multiple bytes of disk read and write.

Security updates don't seem to have any special replication logic barring the use of the 2012 claims-based security (which isn't widely used AFAICT), resulting in the same stage/destage churn you would get for data changes.

Windows DFSR - Changed replicated directory permissions and now have a 350,000 backlog for more than a week

Solution 1:

Solution 2:

Solution 3:

Related

Recent Posts