Are we using DFS "wrong"?
We are a company that has branches across the country. We have a minimum of 1 T1 to each branch and a maxiumum of 2 T1s. We have a DFS server at each branch and at our main office. In the past week one particularly troublesome share that has some of our user files has never had a backlog of 0 files. I have been adjusting the replication schedule to try and get it cleared and the lowest I have managed is 1500 files for that particular share.
So my questions are:
- Is it wrong that we have DFS setup over a WAN?
- Do we just have too little bandwidth for the number of files and amount of changes we have?
- Is there some magical configuration that has not been done?
I've supported your exact setup with 70 WAN sites on Windows Server 2003 R2, mostly with T-1's. It worked great. DFSR was our WAN file server backup method. Can you use MRTG to monitor the bandwidth of your T-1 router to verify any bandwidth issues?
We used MRTG to see bandwidth usage graphics, and site-based GPO's to control the bandwidth used for BITS on DFSR. We set the GPO's to use ~700kb during the day and max out the T-1 at night. At times we had cases where backlogs would grow and if they never emptied then we knew through server and MRTG monitoring the the only option was to get them more bandwidth. DFSR is already compressed and block-level so I don't know if other 3rd party solutions for replicating data offsite will make it better (if you can indeed show that it's bandwidth limited).
DFSR in 2008 or 2008 R2 may have further optimizations, so research that upgrade option as well.
You seem to have answered your own question here. Given the large number of modifications your users are making to this share, no amount of DFS configuration tweaking is going to resolve this. Your backlog is bandwidth-dependent, and will never clear to 0 if your users are making modifications faster then the synchronization routine can sync them. You may want to reconsider the architecture of using DFS in this setup. A document management/collaborative groupware system sounds like it might be a better fit here then trying to run everything at the filesystem level.
With 2008 you could switch to using branchcache - it doesn't just blindly replicate everything which has changed but maintains files which have been opened in a cache whichs updates if the central copy updates. As far as I remember, under 2003 DFS works as a changed file replicator. You change 1 byte in a 100mb file and it recopies 100mb. Under 2008 it only copies the 1 byte.
I'm surprised that you aren't seeing more problems. I used DFS like this years back but started hitting problems replicating around 70gb of files. After investigating I found a MS doc which indicated that it was't supported over the magical 50gb mark. The problems included deletion of unreplicated files... and that on a lan rather than a wan.
Linux has some distributed file systems and file replication tools, but they will all suffer the same problems if you have a lack of bandwidth.
Another alternative would be to use cifs accelerators. We use packeteer (now part of bluecoat) and its feasible for people to open and edit 20mb CAD diagrams over a plain DSL. The opening and saving times are reasonable.