I need your help. I've struggled with this for months and nothing I've found online has helped me. The problem is, domain computers sometimes point to an incorrect domain controller in a different site. I have two sites connected via VPN: Site-A with two domain controllers and Site-B with one. Here is my current configuration:

Network configuration

Computers in Site-A usually connect to either SRV-1 or SRV-2 (as they should) but computers in Site-B only rarely connect to SRV-3. There is a very slow ADSL connection between the sites, so connecting to a wrong site makes the client nearly unusable.

All DCs are also DFS servers. The biggest downside is that while clients connect to wrong DC, they also connect to a wrong DFS server and only list the servers in the wrong site as available DFS servers.

There is a WINS server on SRV-1 and all the machines are pointing their WINS client to 192.168.0.70. WINS records seem okay:

WINS records on SRV-1

I've also gone through the DNS records on all servers, and they seem correct. The servers are in correct sites in AD Sites and Services and they have been assigned the correct subnets. All servers are connected (two-way) to each other in NTDS settings.

Some observations I've made:

SRV-1 in Site-A (192.168.0.0/24):

C:\Users\Administrator>nltest /DCLIST:DOMAIN
Get list of DCs in domain 'DOMAIN' from '\\SRV-1'.
    SRV-1.domain.example.local [PDC]  [DS] Site: Site-A
    SRV-2.domain.example.local        [DS] Site: Site-A
    SRV-3.domain.example.local        [DS] Site: Site-B
The command completed successfully

C:\Users\Administrator>nltest /DSGETSITE
Site-A
The command completed successfully

C:\Users\Administrator>nltest /DSGETDC:DOMAIN
           DC: \\SRV-1
      Address: \\192.168.0.70
     Dom Guid: d8a18714-3272-4075-a5de-b1af522ec649
     Dom Name: DOMAIN
  Forest Name: domain.example.local
 Dc Site Name: Site-A
Our Site Name: Site-A
        Flags: PDC GC DS LDAP KDC TIMESERV WRITABLE DNS_FOREST CLOSE_SITE FULL_SECRET WS
The command completed successfully

C:\Users\Administrator>nltest /dsgetsitecov
Site-A
The command completed successfully

SRV-2 in Site-A (192.168.0.0/24):

C:\Users\Administrator>nltest /DCLIST:DOMAIN
Get list of DCs in domain 'DOMAIN' from '\\SRV-1'.
    SRV-1.domain.example.local [PDC]  [DS] Site: Site-A
    SRV-2.domain.example.local        [DS] Site: Site-A
    SRV-3.domain.example.local        [DS] Site: Site-B
The command completed successfully

C:\Users\Administrator.DOMAIN>nltest /DSGETSITE
Site-A
The command completed successfully

C:\Users\Administrator.DOMAIN>nltest /DSGETDC:DOMAIN
           DC: \\SRV-2
      Address: \\192.168.0.71
     Dom Guid: d8a18714-3272-4075-a5de-b1af522ec649
     Dom Name: DOMAIN
  Forest Name: domain.example.local
 Dc Site Name: Site-A
Our Site Name: Site-A
        Flags: GC DS LDAP KDC TIMESERV WRITABLE DNS_FOREST CLOSE_SITE FULL_SECRET WS
The command completed successfully

C:\Users\Administrator.DOMAIN>nltest /dsgetsitecov
Site-A
The command completed successfully

SRV-3 in Site-B (192.168.2.0/24):

C:\Users\Administrator>nltest /DCLIST:DOMAIN
Get list of DCs in domain 'DOMAIN' from '\\SRV-1'.
    SRV-1.domain.example.local [PDC]  [DS] Site: Site-A
    SRV-2.domain.example.local        [DS] Site: Site-A
    SRV-3.domain.example.local        [DS] Site: Site-B
The command completed successfully

C:\Users\Administrator.DOMAIN>nltest /DSGETSITE
Site-B
The command completed successfully

C:\Users\Administrator.DOMAIN>nltest /DSGETDC:DOMAIN
           DC: \\SRV-3
      Address: \\192.168.2.70
     Dom Guid: d8a18714-3272-4075-a5de-b1af522ec649
     Dom Name: DOMAIN
  Forest Name: domain.example.local
 Dc Site Name: Site-B
Our Site Name: Site-B
        Flags: GC DS LDAP KDC WRITABLE DNS_FOREST CLOSE_SITE FULL_SECRET WS
The command completed successfully

C:\Users\Administrator.DOMAIN>nltest /dsgetsitecov
Site-B
The command completed successfully

Client PC in Site-B (192.168.2.0/24):

C:\WINDOWS\system32>nltest /DCLIST:DOMAIN
Get list of DCs in domain 'DOMAIN' from '\\SRV-2'.
    SRV-2.domain.example.local        [DS] Site: Site-A
    SRV-1.domain.example.local [PDC]  [DS] Site: Site-A
    SRV-3.domain.example.local        [DS] Site: Site-B
The command completed successfully

C:\WINDOWS\system32>nltest /DSGETSITE
Site-A
The command completed successfully

C:\WINDOWS\system32>nltest /DSGETDC:DOMAIN
           DC: \\SRV-2
      Address: \\192.168.0.71
     Dom Guid: d8a18714-3272-4075-a5de-b1af522ec649
     Dom Name: DOMAIN
  Forest Name: domain.example.local
 Dc Site Name: Site-A
Our Site Name: Site-A
        Flags: GC DS LDAP KDC TIMESERV WRITABLE DNS_FOREST CLOSE_SITE FULL_SECRET WS
The command completed successfully

Note that DSGETSITE and DSGETDC return wrong values on the Client PC.

The funny thing is that it changes from day to day where the clients decide to point themselves to. I've tried restarting the clients, it doesn't help. I've tried restarting the servers one-by-one, no difference. None of the servers are multi-homed.

Servers are Windows Server 2008 R2 and client Win7 Pro / Win10 Pro.

Any help will be much appreciated!


Solution 1:

Okay, I figured it out. In the end it was a network issue; no changes needed to be made to the domain controllers. I had already configured policy routes for the VPN, but I had forgot to specify how to prioritize packets. I added an additional policy route for in-LAN traffic, and assigned it a DSCP value of cs4. For the tunneling routes I gave cs5. I'm not familiar with DSCP, but I understood that the smaller the number, the more important the route is (4 and 5 are just random numbers). Below is a screenshot of the final configurations on my ZyXEL ZyWall routers (I hope you appreciate Paint art):

enter image description here

I sort of understand why this solved my problem: now the main priority is to send packets to the local network, and only after that over the VPN. I still find it a bit confusing. Is it possible that if the server and the client are in different networks, the server doesn't see the IP of the client but the IP of one of the routers, and thus cannot make the decision about in which site the IP address belongs? I'm curious about finding out a further explanation.

Thanks to everyone who helped me, I appreciate it :)

Solution 2:

Ping does not provide any useful information. Ping is a straight DNS lookup, and does not represent how the DC Locator process functions.

You may want to use w32tm /query /status /verbose /computer:SRV-3 to confirm the time service on SRV-3 is functioning correctly.

It's probably simplest to do a packet capture, but you may also be able to manually isolate where the process is failing by simulating what occurs on the client PC in Site B.

  1. nslookup
    set type=srv
    _ldap._tcp.dc._msdcs.domain

This should return the list of ALL of your domain controllers (that have A record registered in DNS/aren't filtered by DNS Mnemonics).

  1. Build list of functional DC's by performing LDAP bind to each DC.

  2. First DC to respond returns the client site, the site the DC is in, and DSClosestFlag (0 or 1).

  3. If DC is in client site or DSClosestFlag = 1 or client has no site, use that DC. If not, perform:

    nslookup
    set type=srv
    _ldap._tcp.sitename._sites.domain

  4. Build list of functional DC's by performing LDAP bind to each DC.

  5. If no results from that, use any functional DC. (Unless "Try next closest site" is enabled. By default it is not.).

  6. If results and only one DC, use it. If multiple results, select DC based on SRV lowest priority number/highest weight number.