Can a file be maliciously changed in a way that maintains its original SHA-1 Hash?

Nobody has yet accomplished this for SHA-1. It is possible in theory, but still not practical. The reports about insecurity in SHA-1 just mean that the security level is not as high as we would like it to be and that means we don't have as many years before we have to worry about this as we thought we did.

It is harder to produce a file with the same SHA-1 hash as a given file than it is to craft two files yourself with the same SHA-1 hash. And as far as we know, nobody anywhere in the world has yet accomplished even this easier task. That doesn't mean it can't happen tomorrow though.

It is theoretically possible, but it has not yet been done.

What you are looking for is called a "hash collision:" two files with the same hash. Cryptographic hash codes like SHA-1 are generally designed to make this difficult. Because SHA-1 is a 160 bit code, it will take on average 2^159 brute force attempts to find a duplicate. If an algorithm is found which reliably does better than that against a cryptographic hash, the hash is considered "broken."

MD-5 is an example of a very broken hash. It was supposed to have a strength of 128 bits, requiring on average 2^127 attempts. As is, abusing known vulnerabilities, the actual number of attempts needed can be as low as 2^47. This is a LOT smaller than 2^127. In fact, it has been done in under a day on a modern computing cluster.

I give that example because that is closest to how you are looking to use SHA-1. However, that is not the most common approach cryptanalysis use for making sure hashes aren't broken. They usually allow a collision between two files, as chosen by the attacker, instead of having you pick one file and the attacker seeking to match it. This sort of attack has an advantage of being easier to benchmark. If I find it is "hard" to crack your file, does that mean that another file is similarly strong? This attack where the attacker gets to choose both files ensures we catch the worst of the worst.

This sort of attack allows an interesting trick known as the "birthday attack." Long story short, getting to use the birthday attack halves the strength of the algorithm, so SHA-1 requires 2^80 attempts (on average) and MD5 requires 2^64 attempts (on average). These are half of 160 and 128 respectively.

SHA-1 has known attacks that decrease its strength from 2^80 to 2^69. This is not going to matter much to you. 2^69 attempts is a long time.

However, from history, we've found that hash algorithms are not broken spontaneously, but rather broken over time. Nobody cracks an algorithm like MD-5 taking it from 2^64 to 2^47 over night. It happens over time, as many individuals publish papers about the math they're using against it. One can usually watch the complexity of the attacks go down slowly from the inception of the algorithm (where the best attack is usually the birthday attack).

The fact that we are seeing some changes in collisions suggests SHA-1 is seeing the light at the end of the tunnel. It's still strong, but there might be a desire to go up to the newest SHA-3 which is currently much more secure.

You should really make such decisions from a threat model perspective. How much damage can be done by an attacker if they get one of these collisions. Are your attackers script kiddies with access to a few laptops, or governments with entire supercomputing clusters at their disposal. How large of a time window does an attacker have to break the hash before it's no use (many uses of cryptography involve a "changing of the guard," such as password rotation). All of these will affect how seriously you have to consider collisions.

The flaws in SHA-1 discussed in that article are very specific: they allow attackers to create two things that hash to the same value (this is called a "collision attack"). However, a collision attack requires that the attacker control both files involved. If the attacker doesn't control the original file, a collision attack does not let them find another file with the same hash value.

The reason this matters for TLS/SSL (and signatures in general) is that with those, an attacker often can control both files. A TLS certificate is mostly created by the person requesting it (the bits they don't control are often predictable), so collisions let them make a legitimate certificate and an illegitimate one, get the legitimate one signed, and transfer the signature.

For files, the same situation doesn't always apply. If your concern is that the person making the file is the attacker (e.g. they'll get one thing independently verified as good, and then send you the evil payload with the same hash), the SHA-1 attack applies, and you should look towards phasing it out (although it's not critical yet, as David Schwartz mentioned). If the original file is trusted, then an attacker can't apply the currently known SHA-1 attacks, although you should still think about phasing it out if you can (if you have a choice, use a hash without known attacks like SHA-2).

In response to "the collision won't be useful" -- While an attack doesn't require an attacker to be able to get a useful collision, it's generally not all that hard to turn "collision" into "useful collision." Many file formats have a fair amount of room in which you can have whatever you want without affecting the functionality of the file; an attacker can typically modify that in order to get a collision (if collisions are practically findable), while keeping the functional part as whatever they want it to be. The gap between "academic attack" and "practical attack" can be large; the gap between "any collision" and "useful collision" is generally much smaller.

The more serious issue, which is unrelated to the choice of algorithm, is how you're getting the hash. All a hash does is to shift the problem from "get the real file" to "get the real hash value;" a hash value sent from the same server and over the same connection type as the file is utterly worthless against malicious modification (any attacker who can tamper with the file can tamper with the hash). Hashes are only useful for this if you can trust the hash more than you can trust the file; while that's sometimes the case (torrents, mirrors), they're often used when it isn't the case. So you should be very careful about that whenever using hashes for integrity verification.

You have to differentiate between a collision attack and a preimage attack. Finding any two messages that hash to the same value is a collision attack.
Replacing one particular, given message (here: an executable) with another message that has the same hash is a (second) preimage attack.

SHA-1 is broken insofar as a collision attack can be done in 2⁵² operations according to a Wikipedia article that does not provide a citation for that number (the best attack that I am aware of which is actually credible is the one by Marc Stevens, which takes 2⁶⁰ operations). But let's assume the pessimistic case of 2⁵².

This is concerning because an attack at that scale is not only theoretically conceivable but indeed perfectly doable in under a day on a multi-GPU rig. That's of course a problem for applications where "any two" messages will do. Even the 2⁶⁰ figure given by Stevens (which is 256 times more work) is perfectly feasible if your attacker is willing to throw some extra money at the problem, or is willing to spend a year of time.
Which is exactly the kind of thing that won't prevent someone involved in espionage or cyber crime from forging certificates.

Now, a preimage attack has an exponent that is twice as large, so assuming 2⁵² for the collision attack, that would be 2¹⁰⁴ operations, which is a totally different ballpark.

This is not only impractical (a machine that is a billion times faster than the one mentioned in the previous paragraph would still take around 6 million or so years), but given our puny means of generating energy this is entirely impossible.

Doing such a massive calculation would require an energy source which is much larger than anything we can afford to dedicate to a single operation. No, not quite an energy source the size of the sun, but still a quite big one.

You can realistically expect to get anything from 10 to 50 GFLOPS out of one Watt. Assuming that some kind of miracle happens and processors get about several thousand times more energy efficient over night, one could assume 1 SHA ≈ 1 FLOP (quite optimistic!). This would mean that in order to perform 2¹⁰⁴ hash calculations within 10 years, you need a 10¹²W power plant. In order to run the attack within 1 year, you need a 10¹³W power plant. That's about 50 times of what the entire nuclear power plants of the USA, France, and Japan can produce together, only for forging a single hash.

This is not going to happen, there are much easier ways of achieving the same goal (exploiting the server that stores the original hash and replacing that one, blackmailing someone, etc.).

Can a file be maliciously changed in a way that maintains its original SHA-1 Hash?

Related

Recent Posts