Why are very large prime numbers important in cryptography?

Firstly, you guys are awesome, and I learn quite a bit just from reading the questions of others.

Secondly, a friend asked me recently why large primes are important for data security, and I was unable to give him an answer with which I myself was satisfied. Various wikipedia articles have mostly pointed out an embarrassing paucity in mathematical knowledge on my part, and since this happens to be a very math-related question (and not a programming-related question) I was hoping someone could shed some light.

tl;dr: question reads as title.


Solution 1:

There is a whole class of cryptographic/security systems which rely on what are called "trap-door functions". The idea is that they are functions which are generally easy to compute, but for which finding the inverse is very hard (here, "easy" and "hard" refer to how quickly we know how to do it), but such that if you have an extra piece of information, then finding the inverse is easy as well. Primes play a very important role in many such systems.

One such example is the function that takes two integers and multiplies them together (something we can do very easily), versus the "inverse", which is a function that takes an integer and gives you proper factors (given $n$, two numbers $p$ and $q$ such that $pq=n$ and $1\lt p,q\lt n$). If $n$ is the product of two primes, then there is one and only one such pair.

Another example is the discrete logarithm. To consider a simple example, look at the integers modulo, say, $7$. The integers between $1$ and $6$, inclusively, form a group under multiplication, and in fact every number between $1$ and $6$ is a power of $3$. The "discrete logarithm problem" would be, given a number $x$ between $1$ and $6$, to find a number $a$ such that $3^a$ equals $x$ modulo $7$. In this case, you can just try powers of $3$ until you hit the right answer. But if the modulo is very large, then this would take too much time.

One method for exchanging information over an open channel relies on the fact that we do not have very good methods of finding discrete logarithms in general, but we do have very good methods for computing modular powers. The idea is: suppose you and I need to exchange information. We want to use some very secure cryptographic system that relies on a complicated key. But, how can we agree on a key? If we have some secure way of communicating so that when we agree on the key nobody will overhear us, then why bother with the entire exercise? We should just communicate using that secure way. So instead we need to communicate at a place where we can be overhead. How can we agree on a secret key if everyone can hear us? Well, Diffie and Hellman proposed the following method:

Pick a very large prime $p$, and a number $r$ such that every number between $1$ and $p-1$ is a power of $r$ modulo $p$ (such numbers $r$ are known to exist for every prime; they are called primitive roots). Everyone knows $p$ and everyone knows $r$. Then I pick a secret number $a$, and you pick a secret number $b$. I cannot tell you my secret number (it's secret). But I tell you what $r^a \mod p$ is. Because computing modular powers is easy, I can do this computation easy enough; but because we don't know how to do discrete logarithms easily, we are hoping that nobody will be able to figure out $a$ just from knowing $r^a$... at least, not very quickly. Likewise, you tell me $r^b \mod p$. Now, you know $r^a$, and you know what $b$ is, so you compute $(r^a)^b \mod p$. By the laws of exponent, you now know (secretly!) the number $r^{ab} \mod p$. I, on the other hand, know $r^b$ (because you told me that number) and I know what $a$ is. So I compute $(r^b)^a\mod p$. But this is the same as $r^{ab} \mod p$. So now we both have a piece of information, namely the number $r^{ab}\mod p$. This is going to be our "secret key".

Now, if someone can figure out either $a$ or $b$, then since they also know $r^a$ and $r^b$, they'll be able to figure out our secret key. We hope this is hard, but we certainly need $p$ to be very big: otherwise, they can just try all powers of $r$ until they hit the right one. We need the "search space" to be very big, so we need $p$ to be very big. Added: As jug points out, having $p$ big is not sufficient. There are algorithms for computing discrete logarithms that are particularly good with certain kinds of primes, so we generally also require that $p$ satisfy some additional "good" properties relative to the cryptographic application. You generally want $p$ and $(p-1)/2$ to be both primes, for example. On the other hand, in practice one does not really need $r$ to be a primitive root. Instead, it is enough that it generate a "large" subgroup of the multiplicative group, which one generally wants to be of prime order.

(Note: figuring out $a$ or $b$ is just one way in which they could figure out our secret key $r^{ab}$, since everyone knows $p$, $r$, $r^a$, and $r^b$. It is not known whether this is essentially the only way to break this "key exchange" method; the method really relies on whether one can figure out $r^{ab}$ from knowing $r$, $p$, $r^a$, and $r^b$; this is called the Diffie-Hellman problem; the Diffie-Hellman problem is at most as hard as the Discrete Logarithm Problem, but we do not know if it is just as hard (it could be easier); and we don't know just how hard the Discrete Logarithm Problem is, we just know that we don't have any easy ways of doing it yet).

So key exchange is one place where big primes are very important. (Diffie-Hellman is not the only way to do key exchanges). Another place where big primes play a big role is in RSA which is a cryptosystem that also relies on big primes (this time, two big primes $p$ and $q$, and we do arithmetic modulo $n=pq$).

Added: Might as well add a quick overview of RSA and how the primes come into play. Here, once again modular exponentiation is part of the process. This is an "public key" system: I will tell everyone how to send me secret messages, which hopefully only I can decode. (In Diffie-Hellman, we did not exchange a message; we agreed on a secret key that we will use with a separate system that requires a secret key; for example, AES). I pick two large primes $p$ and $q$, and compute $n=pq$. I also pick a number $e$ that is relatively prime to $(p-1)(q-1)$ (I can do that because I know $p$ and $q$). Then I use the Euclidean algorithm, which is pretty quick, to find a $d$ such that $ed\equiv 1 \pmod{(p-1)(q-1)}$. Finally, I tell everyone what $n$ and $e$ are. If you want to send me a message, you first convert it to a number $M$ using some standard mechanism. Then you compute $M^e \mod n$, and you tell me what $M^e\mod n$ is. I will take $M^e$ and compute $(M^e)^d = M^{ed}\mod n$. Because $ed\equiv 1 \pmod{(p-1)(q-1)}$, then $M^{ed}\equiv M\pmod{n}$, so that is how I recover $M$. The security of the system relies in hoping that from knowing $n$ and $e$, it is difficult to figure out $d$ (it is easy if I know $p$ and $q$; this is why this is believed to be a "trap-door function" as described in the first paragraph). The problem is at most as hard as factoring $n$, because if you can factor $n$ then you can find $d$ the same way I did; it is not known if the problem of finding $M$ from $n$, $M^e$, and $e$ is at least as hard as factoring (it has been shown that some variants are at least as hard as factoring), and again we don't know just how hard factoring is. But: because we know that if you can factor $n$ then you can read the message, then we want to make $n$ difficult to factor. It only has two factors, but you don't want them to be easy to find, so you want $p$ and $q$ to be large for sure. (Again, there are other conditions one usually puts on $e$, $p$, and $q$ to make sure that certain special attacks do not succeed easily, but at least we need $p$ and $q$ to be very big).

Solution 2:

Some cryptographic algorithms use 2 very large primes (such as 128 bit long) and multiply them together. The only way we know how to crack that is to try and find the only 2 factors that are available for that number (the 2 large primes).

Well, it turns out, it takes A LOT of computer power to be able to find those 2 factors. Therefore, even if one tries, it'll take so long that the encryption is considered secure enough.

Most numbers have many factors, but a product of 2 primes only has the 2 primes as factors.