Monday, October 7, 2013

Finding locked out users in AD and their source

Few days ago we received a complaint from a user that his account is constantly getting locked. As many of you know AD can be configured to lock user accounts if users enter their password wrong too many times in specified time. This prevents brute force attacks but can expose your environment to denial of service attacks. Although the most common case for this to occur is that the user change their password and forget to change it to mobile phone which is used to check emails or other devices storing the username and password for authentication.

Usually security log on domain controllers contain the logon workstation name for those failed authentication attempts to help identify the source but sometimes that is not enough. Like in our case the name of the logon workstation in the log entry was \\workstation. That didn't help us much since we do not have any workstations with that name in our domain and our network discovery tool didn't show anything with that name either. To find out the source we had to combine few things which I explain in this blog post.

1. Find out currently logged users in domain for example using powershell:

Import-Module ActiveDirectory
Search-ADAccount -LockedOut | Format-Table -Property samaccountname -AutoSize -HideTableHeaders

2. Use these samaccountnames to find out the domain controller which is receiving these authentication requests. This can be achieved by using powershell to query all domain controllers for LastBadPasswordAttempt:

Get-ADDomainController -Filter { domain -eq $yourdomainname } | foreach { write-output "$($_.hostName) says last bad password attempt for $($samaccountname) was: $((Get-ADUser $samaccountname -Server $_.name -Properties LastBadPasswordAttempt).LastBadPasswordAttempt)" }

where $yourdomainname is the canonical name of your domain, for example corp.example.com and $samaccountname is the account that is locked out.

When the DC which is receiving those incorrect credentials is found you can enable netlogon debug logging by typing:

nltest /dbflag:0x2080ffff

to an administrative command prompt on that DC.

After running that command you will have to restart netlogon service for the change to take effect. The easiest way to do this is simply typing:

net stop netlogon && net start netlogon

to a command prompt. After you have restarted netlogon service you can check the %Systemroot%\Debug\Netlogon.log file to trace the account lockout to the source where incorrect credentials were used. Help for analyzing the netlogon debug logs and error codes can be found at http://technet.microsoft.com/en-us/library/cc776964%28WS.10%29.aspx

After you have found the source for locked user account the netlogon debug flag should be disabled using command:

nltest /dbflag:0x0

And restarting netlogon service again. For curious people out there, those nltest commands modify HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Netlogon\Parameters\DBFlag registry entry to specified value. More details about enabling netlogon debug can be found at http://support.microsoft.com/kb/109626

That's all this time. Next time it is much easier to find out where the bad password attempts are originating and resolve the issue.

Tuesday, September 17, 2013

I'm back.. :D to write about Event ID 2213 in DFS Replication log on Windows Server 2012 Domain Controller

Wow, it has been a long time since last post. Almost two years. Been too "busy" to write anything. About the time to correct that.

Today I came across with DC restarting itself unexpectedly. Still wondering why but that is another story. That unexpected restart caused dirty shutdown of DFSR JET Database. Default behavior of Windows Server 2012 changed from 2008 automatic recovery to manual recovery.

Okay, fine. Now we have chance to take backup of our existing replicating folders before the autorecovery might merge them so that the winning files are not those we want. Sounds great. On the other hand replication is stopped on that volume until someone manual tells it to resume. Not so great if that volume happens to be the volume hosting SYSVOL.

Well, now you think that why should we care, we have Operations Manager monitoring our AD replication and we notice immediately that replication is not working and can manually resume it. One thing that nobody tell you is that DFS Replication monitoring is watching for event id 2212 that states:

"The DFS Replication service has detected an unexpected shutdown on volume %2. This can occur if the service terminated abnormally (due to a power loss,for example) or an error occurred on the volume. The service has automatically initiated a recovery process. The service will rebuild the database if it determines it cannot reliably recover. No user action is required."

The catch is that on Windows Server 2012 DCs what you get is event id 2213 when autorecovery is not on. That event description states that:

"The DFS Replication service stopped replication on volume %2. This occurs when a DFSR JET database is not shut down cleanly and Auto Recovery is disabled. To resolve this issue, back up the files in the affected replicated folders, and then use the ResumeReplication WMI method to resume replication."

That event id 2213 in DFS Replication log from DFSR source is NOT monitored by default on SCOM 2012 AD management pack. Windows Server 2012 is by the way categorized still as 2008.

Luckily it is easy to implement your own monitor to trigger alert when event id 2213 is seen and automatically close the alert when event id 2214 is recorded.

What makes this even more interesting is that according to http://support.microsoft.com/kb/2846759 recommended best practice for Windows Server 2012 is that this autorecovery should be turned on. Maybe someone responsible for AD management pack didn't get that memo about that original design change or forgot to include event id 2213 as a trigger to raise an alert about DFS replication issues. Hopefully this gets fix in the future management packs.