Notes from the field: Windows Time Service interrupts email delivery

A business with Exchange Server noticed that email was not flowing. The internet connection was fine, all the servers were up and running including Exchange 2016. Email has been fine just a few hours earlier. What was wrong?

The answer, or the beginning of the answer, was in the Event Viewer on the Exchange Server. Event ID 1035, only a warning:

Inbound authentication failed with error UnexpectedExchangeAuthBlobCheckForClockSkew for Receive connector Default Mailbox Delivery

Hmm. A clock problem, right? It turned out that the PDC for the domain was five minutes fast. This is enough to trigger Kerberos authentication failures. Result: no email. We fixed the time, restarted Exchange, and everything worked.

Why was the PDC running fast? The PDC was configured to get time from an external source, apparently, and all other servers to get their time from the PDC. Foolproof?

Not so. If you typed:

w32tm /query /status

at a command prompt on the PDC (not the Exchange Server, note), it reported:

Source: Free-running System Clock

Oops. Despite efforts to do the right thing in the registry, setting the Type key in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\Parameters to NTP and entering a suitable list of time servers in the NtpServer key, it was actually getting its time from the server clock. This being a Hyper-V VM, that meant the clock on the host server, which – no surprise – was five minutes fast.

You can check for this error by typing:

w32tm /resync

at the command prompt. If it says:

The computer did not resync because no time data was available.

then something is wrong with the configuration. If it succeeds, check the status as above and verify that it is querying an internet time server. If it is not querying a time server, run a command like this:

w32tm /config /update /manualpeerlist:”0.pool.ntp.org,0x8 1.pool.ntp.org,0x8 2.pool.ntp.org,0x8 3.pool.ntp.org,0x8″ /syncfromflags:MANUAL

until you have it right.

Note this is ONLY for the server with the PDC Emulator FSMO role. Other servers should be configured to get time from the PDC.

Time server problems seem to be common on Windows networks, despite the existence of lots of documentation. There are also various opinions on the best way to configure Hyper-V, which has its own time synchronization service. There is a piece by Eric Siron here on the subject, and I reckon his approach is a safe one (Hyper-V Synchronization Service OFF for the PDC Emulator, ON for every other VM).

I love his closing remarks:

The Windows Time service has a track record of occasionally displaying erratic behavior. It is possible that some of my findings are not entirely accurate. It is also possible that my findings are 100% accurate but that not everyone will be able to duplicate them with 100% precision. If working with any time sensitive servers or applications, always take the time to verify that everything is working as expected.