Authentication via RADIUS : MSCHAPv2 Error 691
I am working on setting up authentication into an Acme Packet Net-Net 3820 (SBC) via RADIUS. The accounting side of things is working just fine with no issues. The authentication side of things is another matter. I can see from a packet capture that the access-request messages are in fact getting to the RADIUS server at which point the RADIUS server starts communicating with the domain controllers. I then see the chain of communication going back to the RADIUS and then finally back to the SBC. The problem is the response I get back is always an access-reject message with a reason code of 16 (Authentication failed due to a user credentials mismatch. Either the user name provided does not match an existing user account or the password was incorrect). This is confirmed by looking at the security event logs where I can see events 4625 and 6273. See the events below (Note: The names and IPs have been changed to protect the innocent):
Event ID: 6273
Network Policy Server denied access to a user.
Contact the Network Policy Server administrator for more information.
User:
Security ID: NULL SID
Account Name: real_username
Account Domain: real_domain
Fully Qualified Account Name: real_domain\real_username
Client Machine:
Security ID: NULL SID
Account Name: -
Fully Qualified Account Name: -
OS-Version: -
Called Station Identifier: -
Calling Station Identifier: -
NAS:
NAS IPv4 Address: 10.0.0.10
NAS IPv6 Address: -
NAS Identifier: radius1.real_domain
NAS Port-Type: -
NAS Port: 101451540
RADIUS Client:
Client Friendly Name: sbc1mgmt
Client IP Address: 10.0.0.10
Authentication Details:
Connection Request Policy Name: SBC Authentication
Network Policy Name: -
Authentication Provider: Windows
Authentication Server: RADIUS1.real_domain
Authentication Type: MS-CHAPv2
EAP Type: -
Account Session Identifier: -
Logging Results: Accounting information was written to the SQL data store and the local log file.
Reason Code: 16
Reason: Authentication failed due to a user credentials mismatch. Either the user name provided does not map to an existing user account or the password was incorrect.
Event ID: 4625
An account failed to log on.
Subject:
Security ID: SYSTEM
Account Name: RADIUS1$
Account Domain: REAL_DOMAIN
Logon ID: 0x3E7
Logon Type: 3
Account For Which Logon Failed:
Security ID: NULL SID
Account Name: real_username
Account Domain: REAL_DOMAIN
Failure Information:
Failure Reason: Unknown user name or bad password.
Status: 0xC000006D
Sub Status: 0xC000006A
Process Information:
Caller Process ID: 0x2cc
Caller Process Name: C:\Windows\System32\svchost.exe
Network Information:
Workstation Name:
Source Network Address: -
Source Port: -
Detailed Authentication Information:
Logon Process: IAS
Authentication Package: MICROSOFT_AUTHENTICATION_PACKAGE_V1_0
Transited Services: -
Package Name (NTLM only): -
Key Length: 0
This event is generated when a logon request fails. It is generated on the computer where access was attempted.
The Subject fields indicate the account on the local system which requested the logon. This is most commonly a service such as the Server service, or a local process such as Winlogon.exe or Services.exe.
The Logon Type field indicates the kind of logon that was requested. The most common types are 2 (interactive) and 3 (network).
The Process Information fields indicate which account and process on the system requested the logon.
The Network Information fields indicate where a remote logon request originated. Workstation name is not always available and may be left blank in some cases.
The authentication information fields provide detailed information about this specific logon request.
- Transited services indicate which intermediate services have participated in this logon request.
- Package name indicates which sub-protocol was used among the NTLM protocols.
- Key length indicates the length of the generated session key. This will be 0 if no session key was requested.
So at first glance it would seem that the issue is merely a case of an invalid username or mismatched password. This is further confirmed in the packet capture where I can see the MSCHAPv2 response has an error code of 691 (Access denied because username or password, or both, are not valid on the domain). The thing is I know I am using a valid username and I have tried many usernames including new ones I created just for troubleshooting. I don't know how many times I have reset the password in an attempt to ensure it is not a mismatch password. I have even made sure to use passwords that are fairly short and contain only letters to ensure there was no terminal encoding issues (we connect to the SBC via SSH clients). I have also done this same thing with the shared secret used during communication between the SBC and the RADIUS server. I have tried prefixing the username with the domain name at login (though I don't think that should be necessary). I have also tried using the full UPN of the user to login. I have tried several RADIUS testing clients (NTRadPing, RadiusTest, etc.), but they either don't support MSCHAPv2 or only support EAP-MSCHAPv2. I have even created my own client using PHP's PECL RADIUS module. Still it always seems to fail with the MSCHAPv2 authentication with an error code of 691. Does anyone have any ideas as to why I always get an invalid username or bad password response when I have done everything possible to ensure that is not the case?
Here are the specs for our RADIUS configuration:
- Windows Server 2012 R2
- SQL Server 2012 Back End Database for accounting.
- The server has been authorized on the domain and is a member of the "RAS and IAS Servers" group. For which that group does have access to the accounts we are testing with.
- The accounts we are testing with do have the "Control access through NPS Network Policy" option checked under their "Dial-in" property tab.
- RADIUS clients configured to simply match on the IP address which you can see from the events above that it is applying the client friendly name.
- Connection Request Policy: The "SBC Authenication" policy is being applied as seen above. The only condition is a regex expression that does successfully match the friendly name.
- Network Policy: As seen in events above, none are getting applied. For troubleshooting purposes I have created a Network Policy that is set to "1" for the processing order and its only condition is a Day and Time Restriction currently set to any time, any day.
- The authentication method is set to only MSCHAPv2 or MSCHAPv2 (User can change password after it has expired). I have tried adding this to just the Network Policy and I have also tried adding this to the Connection Request Policy and setting it to override the authentication method of the Network Policy.
- We do have other RADIUS servers in our domain that use PEAP to authenticate wireless clients and they all work fine. However, we need this to work with MSCHAPv2 only (No EAP).
- All other configurations are set to the defaults.
The only other things of note to consider is the fact that in the events above you can see that the Security ID is "NULL SID". Now I know this is common especially among failed logons but given that this issue is stating an invalid username or bad password, perhaps it matters in this case. Also, this server has been rebuilt using the same computer account in Active Directory. I do not know if it would have worked before the rebuild. Essentially we built this server and only got as far as authorizing the server to the domain and adding SQL when we decided to separate out the SQL role onto another server. Rather than uninstalling SQL we just rebuilt the machine. However, before reinstalling Windows I did do a reset on the computer account. I don't think this should matter but thought I would point it out if there is some weird quirk where reusing the same SID of a previously authorized NPS server would cause an issue.
All in all it is a fairly basic setup and hopefully I have provided enough information for someone to get an idea of what might be going on. Apologies if my understanding of this seems a bit basic, after all, when it comes to RADIUS servers I guess you could say I'm the new guy here.
Edit 1:
In an attempt to further troubleshoot this issue I have tried bringing up additional servers for testing. Here are the additional tests I have performed.
Multiple Domains
I have now tried this in 3 different isolated domains. Both our test and production domains as well as my private home domain which has very little in the way of customizations aside from the modifications made for Exchange and ConfigMgr. All have the same results described above.
VPN Service
Using Windows Server 2012 R2 we brought up a separate server to run a standard VPN setup. The intent was to see if we could use RADIUS authentication with the VPN and if that worked we would know the issue is with the SBCs. However, before we could even configure it to use RADIUS we just attempted to make sure it worked with standard Windows Authentication on the local VPN server. Interestingly, it too fails with the same events getting logged as the RADIUS servers. The client machine being a Windows 8.1 workstation. Again I point out that we have working RADIUS servers used specifically for our wireless environment. The only difference between those RADIUS servers and the ones I am having problems with is that the working wireless servers are using PEAP instead of MSCHAPv2.
FreeRADIUS
Now I'm no Linux guru but I believe I have it up and running. I am able to use ntlm_auth to authenticate users when logged on to the console. However, when the radiusd service tries to use ntlm_auth to do essentially the same thing it fails and returns the same message I've been getting with the Windows server (E=691). I have the radiusd service running in debug mode so I can see more of what is going on. I can post the debug info I am getting if requested. The lines I am seeing of particular interest however is as follows:
(1) ERROR: mschap : Program returned code (1) and output 'Logon failure (0xc000006d)'
(1) mschap : External script failed.
(1) ERROR: mschap : External script says: Logon Failure (0xc000006d)
(1) ERROR: mschap : MS-CHAP2-Response is incorrect
The thing to note here is that while we are essentially still getting a "wrong password" message, the actual status code (0xc000006d) is slightly different than what I was getting on the Windows Servers which was (0xc000006a). From this document you can see what these codes mean: NTSTATUS values . The good thing about this FreeRADIUS server is that I can see all of the challenge responses when it is in debug mode. So if I can wrap my head around how a MSCHAPv2 response is computed I can compare it to see if this is simply a miscomputed challenge response. Update Was just noticing that the 6a code is just the sub-status code for the 6d code. So nothing different from the Windows Servers, I still wonder if there is a computation error with the challenge responses though.
Currently, I am working on bringing up a Windows Server 2008 R2 instance of a RADIUS server to see if that helps at all. However, I would be surprised if something with the service broke between W2K8 R2 and W2K12 R2 without anyone noticing until now. If this doesn't work I may have to open a case with Microsoft. Update: Same results with W2K8 R2.
Update 2
I opened a case with Microsoft and they worked on it for a couple of weeks. The first week I spent my time just trying to get them to understand this has nothing to with wireless and that the device we are trying to connect to does not support authentication against a RADIUS server using any form of EAP. Once they finally understood that we are trying to setup the authentication method as just MSCHAPv2 only, his initial reaction was simply "you can't do that". He said, that no matter what you always need some form of EAP or PEAP setup in the top box (even for PAP and other "Less secure methods" in that authentication method window). This seemed highly unlikely to me so I asked for some documentation stating this for which he then change the subject and never provided any documentation. It became clear I was getting nowhere with Microsoft when they told me that the issue seems to be with the username and password. So it took them 2 weeks to come to that conclusion when the subject line of my premiere ticket had the E691 error code in it which means mismatched username and password and despite the very lengthy paragraph of me explaining that I have done everything possible to ensure it is not a bad username and password.
Unfortunately my project lead does not want to waste anymore premiere support hours on this so we have closed the case. We will likely just deal with the SBC local shared login and setup some other way to enforce accountability (there is only 4 of us that will have access).
I will point out that before I was told to drop the project I did get the FreeRADIUS server to work using MSCHAPv2 only but was only able to do this using accounts local to the FreeRADIUS server. That is plain text passwords stored in a FreeRADIUS config file somewhere. So obviously not a solution but it at least shows that the SBC is correctly communicating to a RADIUS server via MSCHAPv2. This and the other things I have mentioned above lead me to believe that the issue lies between the RADIUS server and the Domain Controllers. While NTLM authentication works fine on both the Windows RADIUS and FreeRADIUS servers while logged into the servers locally (Can login to the Windows RADIUS via the test account and can get successful authentication on the FreeRADIUS server when using ntlm_auth command with just a username and password), neither RADIUS server seems to authenticate via users when coming in as a RADIUS access request (FreeRADIUS uses the same ntlm_auth command I use when logged in locally but instead of providing a username and password to the command it provides a username and challenge response).
So I will end this thread here but I will keep an eye on this and it will notify me if anyone posts something. If someone posts a solution or has comments I will respond.
Solution
Wouldn't you know it, about a month after we had abandoned this approach to using a RADIUS server to control access to the SBC we find a possible solution. Of course at this point we are too busy with other projects to go back and try this solution out so I can't say 100% if this would fix the problem but once I explain you will probably agree this is the answer.
One of my colleagues was at a Microsoft conference having various discussions when it dawned on him that MSCHAPv2 relies on NTLM to generate the password challenges and responses. Now plain old MSCHAP and MSCHAPv2 (i.e. not EAP-MSCHAPv2 or PEAP) when used in Windows RAS services will use NTLMv1 by default.
As many of of you have already started to catch on, we, like many administrators, have disabled NTLMv1 on our DCs and as such the DCs will only accept NTLMv2 requests. This explains why the failure I continued to get was a "bad password" error. The password being sent to the DCs was in NTLMv1 format and was getting ignored.
Once we realized this, I was able to do some more research and I found the following article:
http://support.microsoft.com/kb/2811487
This article describes the same behavior I was experiencing including the E=691 error code I have mentioned. This article also provides a workaround to force RAS services to use NTMLv2 when building a MSCHAPv2 response. Funny how easy it is to find these articles after you know precisely what the issue is.
Again, I have yet to verify this is the issue as I haven't had time to rebuild the RADIUS servers that I have already decommissioned but I do plan on trying this out sometime if I get a chance. I just wanted to post this possible solution in case someone else stumbles across this issue. If anyone can confirm this before I do please let me know.
Edit - Confirmed
We were asked to take on a bigger role with these SBCs and as such we came back to this project and brought up a Windows RADIUS server again. This time we applied the registry key described in the link above. After taking a packet capture of the communication between the RADIUS server and the SBCs I can in fact now see "Access Accept" messages getting fired back at the SBCs. So I can now confirm at least in our scenario that the issue we were having is as described above.
TL;DR
NTLMv1 was disabled on the DCs. Setting a registry key to force the RADIUS server to use NTLMv2 fixed the issue.