Solution 1:

Well, we finally appear to have resolved this issue in our environment. For the benefits of others, here's what we discovered and how we fixed the problem:

To try and gain further insight into what was occurring before/during/after the delays we used Wireshark on a client machine to capture/analyse network traffic whilst that client attempted to access a DFS share.

These captures showed something strange: whenever the delay occurred, in between the DFS request being sent from the client to a DC, and the referral to a DFS root server coming back from the DC to the client, the DC was sending out several broadcast name lookups to the network.

Firstly, the DC would broadcast a NetBIOS lookup for DOMAIN (where DOMAIN is our pre-Windows 2000 Active Directory domain name). A few seconds later, it would broadcast a LLMNR lookup for DOMAIN. This would be followed by yet another broadcast NetBios lookup for DOMAIN. After these three lookups had been broadcast (and I assume timed out) the DC would finally respond to the client with a (correct) referral to a DFS root server.

These broadcast name lookups for DOMAIN were only being sent when the long delay opening a DFS share occurred, and we could clearly see from the Wireshark capture that the DC wasn't returning a referral to a DFS root server until all three lookups been sent (and ~7 seconds passed). So, these broadcast name lookups were pretty obviously the cause of our delays.

Now that we knew what the problem was, we started trying to figure out why these broadcast name lookups were occurring. After a bit more Googling and some trial-and-error, we found our answer: we hadn't set the DfsDnsConfig registry key on our domain controllers to 1, as is required when using DFS in a DNS-only environment.

When we originally setup DFS in our enviroment we did read the various articles about how to configure DFS for a DNS-only environment (e.g. Microsoft KB244380 and others) and were aware of this registry key, but had misintepreted the instructions on when/how to use it.

KB244380 says:

The DFSDnsConfig registry key must be added to each server that will participate in the DFS namespace for all computers to understand fully qualified names.

We thought this meant that the registry key has to be set on the DFS namespace servers only, not realising that it was also required on the domain controllers. After we set DfsDnsConfig to 1 on our domain controllers (and restarted the "DFS Namespace" service), the problem vanished.

Obviously we're happy with this outcome, but I would add that I'm still not 100% convinced that this is our only problem - I wonder if adding DfsDnsConfig=1 to our DCs has only worked around the problem, rather than solving it. I can't figure out why the DCs would be trying to lookup DOMAIN (the domain name itself, rather than a server in the domain) during the DFS referral process, even in a non-DNS-only environment, and I also know I haven't set DfsDnsConfig=1 on domain controllers in other (admittedly much smaller / simpler) DNS-only environments and haven't had the same issue. Still, we've solved our problem so we are happy.

I hope this is helpful to the others who are experiencing a similar issue - and thanks again to those that offered suggestions along the way.

Solution 2:

This could be caused by the DNS server netmask ordering. We came across this recently in Server 2003. This depends on your current subnetting.

Example.

Site 1: IP subnet 10.0.0.0/24 Site 2: IP subnet 10.0.1.0/24

Client in site 2 makes a DNS query for your domain based namespace and will be given the DFS server in site 1 by default as the DNS server is not aware of the site IP boundaries. You need to tell your DNS servers what subnet mask to use to identify which IP addresses to respond with.

See http://support.microsoft.com/kb/842197