Advice on Active Directory design for multihomed servers

Solution 1:

Let me begin by saying that I concur with many of the others -- either convince the client otherwise or run.

However, given your listed requirements (there are many unlisted), I can think of (and partially tested) at least the groundwork for making this happen.

There are several specific aspects that need to be considered.

  1. Active Directory Domain Services Replication
  2. DC Locator Process of Clients/Member Servers
  3. Name resolution and traffic for non-AD DS services

One and two have a lot in common -- in general we are at the whim of Microsoft on this one and have to work within the bounds of Microsoft's AD DS processes.

Number three we have a little bit of room to work with. We can choose the labels used for accessing services (files, database instances, etc.).

Here is what I propose:

Build your Domain Controllers (DC)

  • Likely at least two.
  • Each DC will have two NIC's, one in each IP network/AD DS site -- calling them clt and srv for now.
  • Only configure one NIC in each DC right now in the srv network.

Configure AD Sites and Services properly

  • srv site and subnet
  • clt site and subnet
  • uncheck "Bridge all site links" from Sites -> Inter-site Transports -> Right-click "IP"
  • delete the DEFAULTIPSITELINK if it exists (or if you renamed it) so there are no site links configured. Note that this is the unknown for me -- KCC will likely dump errors into the Directory Service event log saying the two sites (srv and clt) are not connected at varying intervals. However, replication will still continue between the two DC's as they can contact each other using the IP's in the same site.

Configure additional zone in AD DS Integrated DNS

  • If your AD DS domain is acme.local, create a second Primary AD Integrated Zone with dynamic updates enabled called clt.acme.local.

Configure the second NIC's on your DC's

  • These NIC's will be the NIC's in the clt network/site.
  • Set their IP's
  • Here is the magic part -- Adapter Properties -> IPv4 Properties -> Advanced -> DNS Tab -> Set the DNS suffix for this connection to clt.acme.local -> check Register this connection... -> Check Use this connection's DNS suffix... -> OK all the way through.
  • ipconfig /registerdns
  • This will register the clt NIC IP in the clt.acme.local zone -- providing a method for us to control which IP/network is used later.

Configure member server NIC's

  • Member server NIC's in clt site must have their DNS suffix and checkboxes set accordingly as well like above.
  • These settings can be used with static and DHCP, doesn't matter.

Configure DNS [stub] resolver behavior in the sites

  • DC's -> Configure DC srv NIC to use itself and other DC srv NIC IP. Leave DC clt NIC DNS empty (static IP is needed though). (DC DNS server will still listen on all IP's by default).
  • Member servers -> Configure member server srv NIC to use the DC srv site IP's. Leave member server clt NIC DNS empty (static IP can be used).
  • Clients/Workstations -> Configure DNS (either through DHCP or static) to use the DC's clt NIC IP's.

Configure mappings/resources appropriately

  • When servers talk to each other be sure to use .acme.local -> will resolve to srv network IP.
  • When clients talk to servers be sure to use .clt.acme.local -> will resolve to clt network IP.

What am I talking about?

  • AD DS replication will still occur as DC's can resolve each other, and connect to each other. The acme.local and _msdcs.acme.local zone will only contain the DC srv NIC IP's AD DS replication will only happen on the srv network.
  • DC locator process for member servers and workstations will function -- although there exist the possibility of delays at various parts of various AD DS processes when site is unknown, if multiple DC IP's are returned -- they will be tried, fail, and move on until one works. The effects on DFS-N have not been completely evaluated either -- but will still function.
  • Non AD DS services will function fine if you use the aforementioned .acme.local and .clt.acme.local labels as described.

I have not completely tested this as it is rather ludicrous. However, the point of this (wow, lengthy) answer is to begin evaluating whether or not it is possible -- not whether it should be done.

@Comments

@Massimo 1/2 Do not confuse multiple AD DS sites in the acme.local zone, and thus SRV records populated by DC's in those sites in the acme.local zone with needing SRV records in the clt.acme.local zone. The client's primary DNS suffix (and Windows domain to which they are joined) will still be acme.local. The client/workstations only have a single NIC, with primary DNS suffix likely derived from DHCP, set to acme.local.

The clt.acme.local zone does not need SRV records as it will not be used in the DC locator process. It is only used by clients/workstations to connect to member server's non-AD DS services using the member server IP's in the clt network. AD DS related processes (DC locator) will not use clt.acme.local zone, but the AD DS sites (and subnets) in acme.local zone.

@Massimo 3

There will be SRV records for both clt and srv AD DS sites -- just that they will exist in the acme.local zone -- see note above. The clt.acme.local zone does not need DC related SRV records.

Clients will be able to locate a DC fine. Client DNS servers point to the clt IP's of the DC's.

When DC locator process on the client kicks off

  • If the client knows its site the DNS question will be _ldap._tcp.[site]._sites.dc._msdcs.acme.local SRV. This will return back the site specific DC's that have SRV records registered.
  • If the client does not know its site the DNS question will be _ldap._tcp.dc._msdcs.acme.local SRV. This will return back all DC's. The client will attempt to bind to DC's LDAP until it finds one that responds. When the client finds one, it performs a site lookup to determine the client's site, and cache's the site in the registry so future DC locator instances happen quicker.

@Massimo 4

Ugh, nice catch. The way I see it there are two ways around this problem.

  1. The lesser impact (compared to 2 below) is to create an entry in the hosts file on the clients/workstations for dc1.acme.local and dc2.acme.local pointing to the clt NIC IP's of the DC's.

or

  1. Manually create the necessary SRV records in netlogon.dns file on each of the DC's. This likely will have some consequences on the server network. Member servers may at times communicate with the DC's on the clt network if this is configured.

All in all none of it is pretty, but that isn't necessarily the end goal. Maybe the client is just testing your tech chops. Plop it on their conference table and tell them "Here, this will work, but I am charging you 4x my normal rate to configure and support it. You can reduce it to 1.5x my normal rate -- .5x PITA charge, by doing [correct solution]."

As noted earlier, my recommendation is to convince otherwise or run. But it sure is a fun little exercise in ridiculous. :)

Solution 2:

In the end I went with the two sites solution:

  • Two DCs for the "servers" network, two DCs for the "clients" network.
  • Two AD sites, one for the "servers" networks and one for "clients" one.
  • DCs in the "servers" network will only have a NIC sitting on that one (clients are not going to talk to them at all), so this is easy.
  • DCs in the "clients" zone will have two, but will only register in the DNS their client-side ones.
  • Servers will talk to their DCs, clients will talk to their ones.

Of course, this means enabling replication traffic between the two networks; the DCs in the "clients" network will still have a NIC sitting on the "servers" network, but as it will not get registered in the DNS, the DCs in that network will contact them using their client-side IP addresses; so that NIC will in fact be completely useless, and some firewall ports will need to be opened. The only other option would be mangling the DCs' hosts files, but let's hope that can be avoided.

Well, I think this is the best that could be done to fulfill as many (crazy) requirements as possible.

Thanks for all advice :-)

Solution 3:

First of all, when we provide service to our customers, we should question what their requirements are. Enabling the client to understand that their level of complexity is unnecessary.

  • What was the # of clients?
  • This is all internal traffic?
  • What is the functional level of the domains?
  • Is the TLS protcol being used?

Using the K.I.S.S method- Would be creating two VLANs "SVR" and "CLT" enabling SSL/TLS and calling it a Day....