Reference - Password Validation
Quite often, questions (especially those tagged regex) ask for ways to validate passwords. It seems users typically seek password validation methods that consist of ensuring a password contains specific characters, matches a specific pattern and/or obeys a minimum character count. This post is meant to help users find appropriate methods for password validation without greatly decreasing security.
So the question is: How should one properly validate passwords?
Why password validation rules are bad?
Our very own Jeff Atwood (blogger of Coding Horror and co-founder of Stack Overflow and Stack Exchange) wrote a blog about password rules back in March of 2017 titled Password Rules are Bullshit. If you haven't read this post, I would urge you to do so as it greatly mirrors the intent of this post.
If you have never heard of NIST (National Institute of Standards and Technology), then you're likely not using correct cybersecurity methods for your projects. In that case please take a look at their Digital Identity Guidelines. You should also stay up to date on best practices for cybersecurity. NIST Special Publication 800-63B (Revision 3) mentions the following about password rules:
Verifiers SHOULD NOT impose other composition rules (e.g. requiring mixtures of different character types or prohibiting consecutively repeated characters) for memorized secrets.
Even Mozilla's documentation on Form data validation pokes fun at password rules (page archive here):
"Your password needs to be between 8 and 30 characters long, and contain one uppercase letter, one symbol, and a number" (seriously?)
What happens if you impose composition rules for your passwords? You're limiting the number of potential passwords and removing password permutations that don't match your rules. This allows hackers to ensure their attacks do the same! "Ya but there's like a quadrillion (1,000,000,000,000,000 or 1x1015) password permutations": 25-GPU cluster cracks every standard Windows password in <6 hours (958 = 6,634,204,312,890,625 ~ 6.6x1015 passwords).
This StackExchange Security post extends the XKCD comic above.
How do I validate passwords?
1. Don't create your own authentication
Stop requiring passwords altogether, and let people log in with Google, Facebook, Twitter, Yahoo, or any other valid form of Internet driver's license that you're comfortable with. The best password is one you don't have to store.
Source: Your Password is Too Damn Short by Jeff Atwood.
2. Creating your own authentication
If you really must create your own authentication methods, at least follow proven cybersecurity methods. The following two sections (2.1 and 2.2) are taken from the current NIST publication, section 5.1.1.2 Memorized Secret Verifiers.
2.1. Follow PROVEN cybersecurity methods
NIST states that you SHOULD:
- Require subscriber-chosen memorized secrets to be at least 8 characters in length.
- Jeff Atwood proposes passwords should be a minimum of 10 characters for normal users and a minimum of 15 characters for users with higher privileges (i.e. admins and moderators).
- Permit subscriber-chosen memorized secrets up to 64 characters or more in length.
- Ideally, you shouldn't even put an upper limit on this.
- Allow all printing ASCII (including the space character) and Unicode.
- For purposes of length requirements, each Unicode code point SHALL be counted as a single character.
- Compare the prospective secrets against a list that contains values known to be commonly-used, expected, or compromised. For example:
- Passwords obtained from previous breach corpuses.
- Dictionary words.
- Repetitive or sequential characters (e.g.
aaaaaa
,1234abcd
) - Context-specific words, such as the name of the service, the username, and derivatives thereof.
- Offer guidance to the subscriber, such as a password-strength meter.
- Implement a rate-limiting mechanism that effectively limits the number of failed authentication attempts that can be made on the subscriber's account (see Rate Limiting (Throttling)).
- Force a change if there is evidence of compromise of the authenticator.
- Permit claimants to use paste functionality when entering a memorized secret (facilitates the use of password managers, which typically increase the likelihood that users will choose stronger memorized secrets).
2.2. DO NOT use any of the methods in this section!
The same publication also states that you SHOULD NOT:
- Truncate the secret.
- Permit the subscriber to store a hint that is accessible to an unauthenticated claimant.
- Prompt subscribers to use specific types of information (e.g. "What was the name of your first pet?") when choosing memorized secrets.
- Impose other composition rules (e.g. requiring mixtures of different character types or prohibiting consecutively repeated characters) for memorized secrets.
- Require memorized secrets to be changed arbitrarily (e.g. periodically).
There are a plethora of websites out there explaining how to create "proper" password validation forms: Majority of these are outdated and should not be used.
3. Using Password Entropy
Before you continue to read this section, please note that this section's intent is not to give you the tools necessary to roll out your own security scheme, but instead to give you information about how current security methods validate passwords. If you're considering creating your own security scheme, you should really think thrice and read this article from StackExchange's Security community.
3.1. Overview of Password Entropy
At the most basic level, password entropy can be calculated using the following formula:
In the above formula:
- represents password entropy
- is the number of characters in the pool of unique characters
- is the number of characters in the password
This means that represents the number of possible passwords; or, in terms of entropy, the number of attempts required to exhaust all possibilities.
Unfortunately, what this formula doesn't consider are things such as:
- Generic passwords: i.e.
Password1
,admin
- Names: i.e.
John
,Mary
- Commonly used words: i.e. In the English language
the
,I
- Reversed/Inverted words: i.e.
drowssap
(password backwards) - Letter substitution (aka leet): i.e.
P@$$w0rd
Adding logic for these additional considerations presents a large challenge. See 3.2 for existing packages that you can add to your projects.
3.2. Existing Password Entropy projects
At the time of writing this, the best known existing library for estimating password strength is zxcvbn by Dropbox (an open-source project on GitHub). It's been adapted to support .netangularjscc#c++gojavajavascriptobjective-cocamlphppythonrestrubyrustscala
Doing it the wrong way
I understand, however, that everyone has different requirements and that sometimes people want to do things the wrong way. For those of you that fit this criterion (or don't have a choice and have presented everything above this section and more to your manager but they refuse to update their methods) at least allow Unicode characters. The moment you limit the password characters to a specific set of characters (i.e. ensuring a lowercase ASCII character exists a-z
or specifying characters that the user can or cannot enter !@#$%^&*()
), you're just asking for trouble!
P.S. Never trust client-side validation as it can very easily be disabled. That means for those of you trying to validate passwords using javascript STOP. See JavaScript: client-side vs. server-side validation for more information.
The following regular expression pattern does not work in all programming languages, but it does in many of the major programming languages (java.netphpperlruby). Please note that the following regex may not work in your language (or even language version) and you may need to use alternatives (i.e. python: see Python regex matching Unicode properties). Some programming languages even have better methods to check this sort of thing (i.e. using the Password Validation Plugin for mysql) instead of reinventing the wheel. Using node.js the following is valid if using the XRegExp addon or some other conversion tool for Unicode classes as discussed in Javascript + Unicode regexes.
If you need to prevent control characters from being entered, you can prompt the user when a regex match occurs using the pattern [^\P{C}\s]
. This will ONLY match control characters that are not also whitespace characters - i.e. horizontal tab, line feed, vertical tab.
The following regex ensures at least one lowercase, uppercase, number, and symbol exist in a 8+ character length password:
^(?=\P{Ll}*\p{Ll})(?=\P{Lu}*\p{Lu})(?=\P{N}*\p{N})(?=[\p{L}\p{N}]*[^\p{L}\p{N}])[\s\S]{8,}$
-
^
Assert position at the start of the line. -
(?=\P{Ll}*\p{Ll})
Ensure at least one lowercase letter (in any script) exists. -
(?=\P{Lu}*\p{Lu})
Ensure at least one uppercase letter (in any script) exists. -
(?=\P{N}*\p{N})
Ensure at least one number character (in any script) exists. -
(?=[\p{L}\p{N}]*[^\p{L}\p{N}])
Ensure at least one of any character (in any script) that isn't a letter or digit exists. -
[\s\S]{8,}
Matches any character 8 or more times. -
$
Assert position at the end of the line.
Please use the above regular expression at your own discretion. You have been warned!