Why not use MD5 for password hashing?
Solution 1:
The collision resistance property of MD5 has been broken for a long time. Note that that preimage resistance and second preimage resistance have not yet been cracked, however as there are better algorithms out there (SHA-2) it would be wise to move to these rather than relying on a cryptographic hash that has already begun to lose its cryptographic properties. Note: The collision resistance property does not matter when storing hashed passwords - what you need to make sure of is that the preimage resistance property is sound - that it is computationally infeasible to find the original password given a certain hash value (and salt). As I mentioned, since one of the cryptographic properties is already broken I would be concerned that the others will follow soon.
When you store a password hash, you should build in some protection that the original password cannot be retrieved in case an attacker manages to extract these hashes. This is important because if an attacker manages to retrieve the password table only they can then use the data to either log into your system directly, or to log into other systems where the user has reused the same password.
When storing passwords, it is important to use a slow algorithm, such as bcrypt, scrypt or pbkdf2. A legitimate user should only have to experience the delay once, upon first login. An attacker will have to experience the delay for every password that they guess - remember rainbow tables will not be being used here because the passwords are salted. The attacker will be hashing each password guess in line with your chosen algorithm and iteration count.
It is important to tune the number of iterations for your system just so the right "strength" is used not to cause legitimate users any real annoyance when logging into your system. This is known as number of "rounds" or the "iteration count". For example, iterating for about one second should be sufficient. It may be safe to assume that an attacker can run through hashes at ten times the speed of your system's hardware. Therefore this limits the attacker to 10 guesses per second, rather than two billion with MD5.
Regarding DoS attacks
Yes, the extra processing your application performs upto login could be a target for an attacker to submit either really long passwords to your application, or to repeatedly hit it with login requests in order to consume CPU and memory resources on your server. You are right to be concerned.
These type of attacks can be mitigated in the following ways:
- Log the username and IP address of each login attempt. After say 6 failed attempts, introduce a delay in response from your application if that username or IP is repeated again. This will also help mitigate password guessing attacks in general.
- For example you could artificially delay by 1 second, then 2 seconds then 4 and up to a reasonable value (e.g. 16 seconds).
- This has the advantage that an attacker can't lock out another account on purpose as the legitimate user only has to wait 16 seconds.
- An attacker could use a botnet and random usernames to bypass these checks, however they would need a huge number of IP addresses than they would without this control, and a more casual attacker would also not be aware that the delay in response was artificial.
- Monitor the number of login attempts on your system. Once this is above a set threshold rate (e.g. 10 per second), introduce a CAPTCHA to solve to continue the login process. The threshold rate you choose very much depends on the user base and capacity of your system.
- Implement two factor authentication. Only proceed to validate password by hashing once the One Time Password has been validated.
Solution 2:
The problem with MD5 is exactly that it is so fast, you can calculate about 9 Giga MD5/s with common hardware. To brute-force a whole english dictionary with about 200000 words you need only a fraction of a milli-second.
This is why appropriate hash algorithms like BCrypt offer a cost factor. The cost factor defines how much time is needed to calculate the hash and can be inreased in future. 50 milliseconds for a login is hardly an obstacle, but for brute-forcing it is deadly.