Algorithms for Finding the Prime Factorization of an Integer
As practice, I am currently writing a program that takes a given integer $n$ as input, and then finds the (unique) prime factorization of $n$, provided $n$ is composite.
My question is about algorithms: are there preferred algorithms for doing this? I've heard that factoring is a tricky business.
What I'm doing currently is that I use a prime sieve to find the primes $\leq \sqrt{n}$, then I loop through the list of primes (starting from $2$), checking divisibility --- if divisible, I write that prime to a list of prime factors, divide the integer by the prime, and begin looping through the list of primes again, checking divisibility of the updated integer.
This seems like a naive and inefficient approach. What methods would be more effective? What number-theoretic facts should I use to shortcut calculations?
A useful and fairly easy to implement integer factorization algorithm is Pollard's Rho Algorithm. You can supplement it with trial division or another factorization method in case the rho algorithm fails to find a nontrivial divisor, and in any case you will want to also implement a primality testing algorithm (I like a deterministic version of Miller-Rabin, which is guaranteed to work for most numbers and, depending on the version, for all numbers up to a large limit) to verify when the factors you have found are prime.
I've been working on an identical problem. What I've learned so far is that determining the primality of a number is very difficult but also that prime factorization is even more difficult (computationally). Currently, I only have my "Prime Factorization Algorithm" perform trial division on $n$ for primes in the sequence $2,3,5,7,11,...,\sqrt{n}$ because anything over the square root of $n$ will necessarily have a factor below $\sqrt{n}$ too. The only exceptions are input values of $n$ which are perfect squares (i.e., 4,9,16,25,...) because $\sqrt{n}$ itself accounts for one of the factors. This algorithm is nested in the same way you describe yours; that is, after finding a prime, take the quotient and repeat the process and storing each new prime as it comes along. Though this algorithm does not run in polynomial time, it is straightforward to program and entirely deterministic (as opposed to using probabilistic primality tests for larger values of $n$).
My primary concern right now is not factoring arbitrary inputs for $n$, but rather generating the list of primes up to and including $\sqrt{n}$ and this is not trivial. I'm currently working to store each new prime as I generate it. But there are some ways to make this prime-number-generator work faster if you know:
- All prime numbers (greater than or equal to 7) end in a 1,3,7,9.
- All primes greater than $c$ can be represented in the form $ck+i$. You might be familiar with the simple example that primes over 6 can be represented as $6k+1$ or $6k+5$. Certain values of $c$ work more "efficiently" than others: through my empirical tests and some research I have found that Primorial numbers are usually good $c$ values (2,6,30,210,2310,etc. - click here for Wikipedia's take on Primality Tests and look for the paragraph that starts with the phrase "Generalizing further...")
You have not made much of a distinction about large numbers or small numbers; for the moment small numbers are good enough. At some point I realized I wanted a C++ command to factor any 32 bit signed integer, so absolute value up to $2^{31} - 1$ or 2,147,483,647. If I have an integer variable named n I can print it and its factorization with cout << n << Factored(n) << endl based on the code below
example stand alone program:
jagy@phobeusjunior:~/old drive/home/jagy/Cplusplus$ ./factor
number to be factored ? 35283528
= 2^3 * 3^2 * 7^2 * 73 * 137
jagy@phobeusjunior:~/old drive/home/jagy/Cplusplus$
string stringify(int x)
{
ostringstream o;
o << x ;
return o.str();
}
string Factored(int i)
{
string fac;
fac = " = ";
int p = 2;
int temp = i;
if (temp < 0 )
{
temp *= -1;
fac += " -1 * ";
}
if ( 1 == temp) fac += " 1 ";
if ( temp > 1)
{
int primefac = 0;
while( temp > 1 && p * p <= temp)
{
if (temp % p == 0)
{
++primefac;
if (primefac > 1) fac += " * ";
fac += stringify( p) ;
temp /= p;
int exponent = 1;
while (temp % p == 0)
{
temp /= p;
++exponent;
} // while p is fac
if ( exponent > 1)
{
fac += "^" ;
fac += stringify( exponent) ;
}
} // if p is factor
++p;
} // while p
if (temp > 1 && primefac >= 1) fac += " * ";
if (temp > 1 ) fac += stringify( temp) ;
} // temp > 1
return fac;
} // Factored
Note that, in dealing with larger numbers, canned programs generally do such trial division up to some prime bound. My Factored printing command above finishes the job for small numbers, as I said, including a final possible largest prime factor called "temp." In the earliest years of Mathematica, they first did trial division with all primes up to $2^{31/2}$ or about 46,340. If that did not finish the job, other methods were attempted. So, one thing you could do now is to save a list of the primes up to 46,340, and report what happens when all those are pulled out of your big number, as I did. Then you can think about what to do with the leftover factor (my 'temp'), which could be prime or composite.
Don't loop over the primes again; you have already checked most of them (if they did not divide the original number, they won't divide the quotient). When you find a prime that divides your number, write that prime to a list, divide the integer by that prime, then continue through the list of primes from where you left off (making sure to repeat the one you just added to the list). You can also stop once your primes are bigger than the square root of the current quotient you are working with.
There are much better factoring algorithms but the best (number field sieve) involves a good amount of algebraic number theory.