What is the difference between weak and strong resistance
I have read some texts about strong collision resistance and weak collision resistance, but I was unable to understand the difference. The only thing I can understand that there is a low probability of collision in hash functions which have weak collision resistance, and a higher probability of collision in strong collision resistance hash functions. I could not understand what is the real thing, what is the significance of these parameters. Can anyone help me on this ?
Solution 1:
The weak collision resistance property is sometimes also referred to as second preimage resistance:
Given an arbitrary x there exists no x' with x' != x so that h(x) = h(x')
Strong collision resistance on the other hand is defined as:
There exist no x and x' with x != x' so that h(x) = h(x')
The obvious difference in their definitions is that for weak collision resistance we assume to be bound to a particular choice of x, whereas in the definition of strong collision resistance we are free to arbitrarily choose our x and x'. Still this seems to be very similar, so let's look at two examples.
Weak collision resistance
A good example where we are actually only interested in weak collision resistance would be a simple password storage scheme. Assume we store user-provided passwords in a database by storing their hash. Authentication would succeed when the hash of some password a user provides is equal to the value that was stored previously (this is an inherently insecure scheme though, but please bear with me for the moment). Now in that case, the given x is the (unknown) original password that was provided earlier. If an attacker were capable of solving the "second preimage" problem efficiently, he could obtain an x' whose hash value is the same as that of the original x, and would thus be authenticated successfully. Please note that the capability to produce arbitrary collisions (i.e. solving the strong collision problem) is useless in general in this scenario because it is not too likely that the x and x' we get resemble actual passwords whose hashes have already been stored in the database.
Strong collision resistance
A different scenario where our concern is strong collision resistance instead is for example an application where you want to be able to look up arbitrary data stored in a database with the help of unique ids. Instead of issuing queries on the original data (which would often be very slow due to the potentially unbounded size of the data), you would compute hashes of the data instead. Hashes are very compact, limited in their size and can thus be queried much more efficiently. As a matter of fact, in these cases you often don't mind the (second) pre-image resistance property of a hash function at all, mostly because the preimages themselves are no secret. What you do care about, though, is that you would absolutely want to avoid two distinct data sets to hash to the same value, which is essentially a collision. You don't care about any collision in particular, but you want this property to hold universally - i.e. you don't want any two data sets hash to the same value (imagine there is a 'unique constraint' defined on that column). Because security is often no issue in these applications, we often use non-cryptographic hashes, mostly because they perform better.
The relationship between both
Intuitively and also implied by their names, we would assume that strong collision resistance is something that is harder to provide than weak collision resistance. Luckily for us, our intuition can be proven to be correct under the Random Oracle Model. We can prove this by contrapositive by assuming that if we had an efficient probabilistic polynomial algorithm for solving "second preimage", then this would also give us an efficient algorithm for solving "collision".
Consider a hash function h and this following simple probabilistic algorithm [1]:
Let 2ndPreimage be another probabilistic (e, Q)-algorithm that solves "second preimage" for the hash function h.
Choose x uniformly at random
value = 2ndPreimage(h, x)
case value == failure -> return failure
case value == x' (!= x) -> return (x, x')
It is easy to see that this is also an (e, Q)-algorithm that solves the strong collision problem. This implies that given we have an algorithm to solve "second preimage", we can also use this algorithm that is likely to produce a collision. As we made no assumptions on the underlying hash function h, we can now safely say that
Strong collision resistance implies weak collision resistance but the opposite doesn't necessarily hold.
[1] e is the success probability of the algorithm, 0 <= e < 1. Q is the maximum number of oracle queries (i.e. "evaluations" of the algorithm). In case of success, the algorithm returns a valid solution, otherwise it returns a value indicating failure.
Solution 2:
I wrote this answer due to the comment of Alexander; All the definition are copied from; Cryptographic Hash-Function Basics: Definitions, Implications, and Separations for Preimage Resistance, Second-Preimage Resistance, and Collision Resistance by P. Rogaway and T. Shrimpton.
-
preimage-resistance — for essentially all pre-specified outputs, it is computationally infeasible to find any input which hashes to that output, i.e., to find any preimage
x'
such thath(x') = y
when given any y for which a corresponding input is not known. -
2nd-preimage resistance, weak-collision — it is computationally infeasible to find any second input which has the same output as any specified input, i.e., given
x
, to find a 2nd-preimagex' != x
such thath(x) = h(x')
. -
collision resistance, strong-collision — it is computationally infeasible to find any two distinct inputs
x
,x'
which hash to the same output, i.e., such thath(x) = h(x')
.
Fact 1: Collision resistance implies 2nd-preimage resistance of hash functions.
Fact 2: 2nd-preimage resistance implies preimage resistance.
As Alexander noted, by the pigeonhole principle, when the input space larger than the output space of the hash function the collisions are inevitable.