Tuesday, September 17, 2013

I'm back.. :D to write about Event ID 2213 in DFS Replication log on Windows Server 2012 Domain Controller

Wow, it has been a long time since last post. Almost two years. Been too "busy" to write anything. About the time to correct that.

Today I came across with DC restarting itself unexpectedly. Still wondering why but that is another story. That unexpected restart caused dirty shutdown of DFSR JET Database. Default behavior of Windows Server 2012 changed from 2008 automatic recovery to manual recovery.

Okay, fine. Now we have chance to take backup of our existing replicating folders before the autorecovery might merge them so that the winning files are not those we want. Sounds great. On the other hand replication is stopped on that volume until someone manual tells it to resume. Not so great if that volume happens to be the volume hosting SYSVOL.

Well, now you think that why should we care, we have Operations Manager monitoring our AD replication and we notice immediately that replication is not working and can manually resume it. One thing that nobody tell you is that DFS Replication monitoring is watching for event id 2212 that states:

"The DFS Replication service has detected an unexpected shutdown on volume %2. This can occur if the service terminated abnormally (due to a power loss,for example) or an error occurred on the volume. The service has automatically initiated a recovery process. The service will rebuild the database if it determines it cannot reliably recover. No user action is required."

The catch is that on Windows Server 2012 DCs what you get is event id 2213 when autorecovery is not on. That event description states that:

"The DFS Replication service stopped replication on volume %2. This occurs when a DFSR JET database is not shut down cleanly and Auto Recovery is disabled. To resolve this issue, back up the files in the affected replicated folders, and then use the ResumeReplication WMI method to resume replication."

That event id 2213 in DFS Replication log from DFSR source is NOT monitored by default on SCOM 2012 AD management pack. Windows Server 2012 is by the way categorized still as 2008.

Luckily it is easy to implement your own monitor to trigger alert when event id 2213 is seen and automatically close the alert when event id 2214 is recorded.

What makes this even more interesting is that according to http://support.microsoft.com/kb/2846759 recommended best practice for Windows Server 2012 is that this autorecovery should be turned on. Maybe someone responsible for AD management pack didn't get that memo about that original design change or forgot to include event id 2213 as a trigger to raise an alert about DFS replication issues. Hopefully this gets fix in the future management packs.