Have a DC that has recently been part of a business continuity test. From what I understand the server (which is virtual) was snapshotted, test carried out while the link between the two sites was down and then reverted to the snapshot. Now that the link is back up I am seeing notifications through Solar Winds that the AD service is in error. Looking at the server the NETLOGON service is paused. From what I can gather from the event logs this is due to repeated replication attempts failing. There is also a notification that AD was restored in an unsupported method (probably snapshot).

I have tried to force replication using the sites and services snapin but that fails, stating that the server is currently rejecting replication. I can ping the server though oddly it seems to respond from the 10.168.3 NIC and not the 10.168.50 NIC that I would have expected. Both IPs can be pinged though and the server can be connected to via RDP or console via vSphere.

Running a repadmin /show various failure but I am sure these are due to some underlying failure that is blocking the replication service from starting. Bit new to this level of troubleshooting but would be grateful of any help that could be thrown my way.

EDIT: Wondwering if it may be something to do with a USN Rollback (?)/. Link to KB here


Solution 1:

Your issue is almost definitely due to the USN Rollback. Reverting back to a snapshot is not a supported method for recovering a DC. To resolve the issue, follow the steps outlined in the KB article you referenced. This will include Demoting the DC, cleaning up the metadata, and then promoting it.

Solution 2:

Three things:

If you see errors like that, you should never attempt to force replication. There is a reason that replication was stopped, and it is usually bad.

Do not use snapshots on a domain controller.

You don't want to be in a scenario where someone turned up an old copy of a dc and now you are replicating objects that should be gone. If you have not already done so, you should enable strict replication. Enabling this setting on a domain controller prevents lingering objects from being replicated inbound from an offending dc with a lingering object.

Running Domain Controllers in Hyper-V
http://technet.microsoft.com/en-us/library/virtual_active_directory_domain_controller_virtualization_hyperv%28WS.10%29.aspx

From the article:
Strict replication consistency should be enabled on all domain controllers
http://technet.microsoft.com/en-us/library/dd723692%28WS.10%29.aspx

When a domain controller in your Active Directory environment is disconnected from the replication topology for an extended period of time, all objects that are deleted from AD DS on all other domain controllers might remain on the disconnected domain controller. Such objects are called lingering objects. When this domain controller is reconnected to the replication topology, it acts as a source replication partner that has one or more objects that its destination replication partners no longer have. Problems occur when these lingering objects on the source domain controller are updated and these updates are sent by replication to the destination domain controllers. A destination domain controller can respond in one of two ways:

  1. If the destination domain controller has strict replication consistency enabled, it recognizes that it cannot update the object (because the object does not exist), and it locally halts inbound replication of the directory partition from that source domain controller.

  2. If the destination domain controller does not have strict replication consistency enabled, it requests the full replica of the updated object, which introduces a lingering object into the directory.

An outdated domain controller can store lingering objects with no noticeable effect as long as an administrator, application, or service does not update the lingering object or attempt to create an object with the same name in the domain or with the same user principal name (UPN) in the forest. However, the existence of lingering objects can cause problems, especially if the object is a security principal. The following symptoms indicate that a domain controller has lingering objects:

  • A deleted user or group account remains in the global address list (GAL) on computers running Microsoft Exchange Server. Therefore, although the account name appears in the GAL, attempts to send e-mail messages result in errors.

  • Multiple copies of an object appear in the object picker or GAL for an object that should be unique in the forest. Duplicate objects sometimes appear with altered names, causing confusion on directory searches. For example, if the relative distinguished name (also known as DN) of two objects cannot be resolved, conflict resolution appends "*CNF:GUID" to the name, where * represents a reserved character, CNF is a constant that indicates a conflict resolution, and GUID represents the objectGUID attribute value.

  • E-mail messages are not delivered to a user whose Active Directory account appears to be current. After an outdated domain controller or global catalog server becomes reconnected, both instances of the user object appear in the global catalog. Because both objects have the same e-mail address, e-mail messages cannot be delivered.

  • A universal group that no longer exists continues to appear in a user’s access token. Although the group no longer exists, if a user account still has the group in its security token, the user might have access to a resource that you intended to be unavailable to that user.

  • A new object or Exchange mailbox cannot be created, but you do not see the object in AD DS. An error message reports that the object already exists.

  • Searches that use attributes of an existing object incorrectly find multiple copies of an object of the same name. One object has been deleted from the domain, but it remains in an isolated global catalog server.