Should I support Unicode in passwords?
I am sure there is no technical problem but maybe gmail and hotmail are not supporting that on purpose. This kind of websites have a wide audience and should be accessible from everywhere.
Let's imagine the user have a password in Japanese but he is on travel and go to a cyber cafe and there is no Japanese support the user won't be able to login.
One other problem is to analyze the password complexity, it's not so difficult to make sure the user didn't type a common word in English but what about in Chinese/Russian/Thai. It is much more difficult to analyze the complexity of a password as you add more languages.
So in case you want your system to be accessible, it's better to ensure that the user would be able to type his password on every kind of devices/OSes/environments, so the alpha numeric password with most common symbols(!<>"#$%&
etc..) is kind of good set of characters available everywhere.
Generally I am strongly in favor of not restricting what kinds of characters are allowed in passwords. However, remember that you have to compare something to something stored which may be the password or a hash. In the former case you have to make sure that comparison is done correctly which is much more complex with Unicode than with ASCII alone; in the latter case you would have to ensure that you are hashing exactly the same whenever it is entered. Normalization forms may help here or be a curse, depending on who applies them.
For example, in an application I'm working on I am using a hash over a UTF-8 conversion of the password which was normalized beforehand to weed out potential problems with combining characters and such.
The biggest problem the user may face is that they can't enter it in some places, like on another keyboard layout. This is already the case for one of my passwords but never was a problem so far. And after all, that's a decision the user has to make in choosing their password and not one the application should make on behalf of the user. I doubt there are users who happily use arbitrary Unicode in their passwords and not think of the problems that may arise when using another keyboard layout. (This may be an issue for web-based services more than anything else, though.)
There are instances where Unicode is rightly forbidden, though. One such example is TrueCrypt which forces the use of the US keyboard layout for boot-time passwords (for full-volume encryption). There is no other layout there and therefore Unicode or any other keyboard layout only produces problems.
However, that doesn't explain why they forbid Unicode in normal passwords. A warning might be nice but outright forbidding is wrong in my eyes.
So I'm wondering if there's some technical or usability issue that I'm overlooking.
There's a technical issue with non-ASCII passwords (and usernames, for that matter) with HTTP Basic Authentication. As far as I know the sites you mentioned don't generally use Basic Authentication, but it might be a hangover from systems that do.
The HTTP Basic Authentication standard defines a base64-encoded username:password
token. This means if you have a colon in the username or password the results are ambiguous. Also, base64-decoding the token gives you only bytes, with no direction of how to convert those bytes to characters. And guess what? The different browsers use different encodings to do it.
Opera and Chrome use UTF-8.
IE uses the client system's default code page (which is of course never UTF-8) and mangles characters that don't fit in it using the Windows standard Try To Find A Character That Looks A Bit Like It, Or Maybe Just Not (Who Cares) algorithm.
Safari uses ISO-8859-1, and silently refuses to send any auth token at all when the username or password has characters that don't fit.
Mozilla takes the lowest 8 bits of the code point (similar to ISO-8859-1, but more broken). See bug 41489 for tortuous discussion with no outcome or progress.
So if you allow non-ASCII usernames or passwords then the Basic Authentication process will be at best complicated and inconsistent, with users wondering why it randomly works or fails when they use different computers or browsers.
No. Restrict passwords to ASCII characters.
When you input a password, bullets are displayed to conceal the password.
But when you input Japanese and other languages, you must go through an input method, converting the keystrokes into the desired characters. This requires you to see what the characters are.