Solution 1:

Thanks to Eu Yu's link, I found this paper by Birkhoff and Vandiver. It was a bit confusing to read so I think I'd better clean up their proof. I simplified their proof of Theorem 5 using cyclotomic polynomials, too. Therefore I needed Theorem 7 in this paper by Yimin Ge and a generalisation of Theorem 4 in the earlier cited presentation of Lola Thompson.

Their Theorem 1 isn't such a big shot, and Theorem 2 and 3 are nowadays known as special cases of the Lifting the Exponent Lemma. Hence not much to say about these. Theorem 4 seemed to be the main matter, and it's the only part of their paper I kept when constructing my own proof. I'll give a much shorter alternative for Theorem 5.

Let's fix two coprime positive integers $a$ and $b$ with $a>b$.

It suffices to prove there is a prime divisor of $a^n-b^n$ that does not divide $a^k-b^k$ for all positive divisors $k\mid n$ with $k<n$: if $p\mid a^n-b^n$ and $c$ is an inverse of $b$ modulo $p$, then the smallest $k>0$ for which $(ac)^k\equiv1$ satisfies $k\mid n$.

Now we define $z_n=a^n-b^n$ and

$$\Psi_n=\prod_{d\mid n}{z_{\frac nd}}^{\mu(d)}.\qquad(1)$$

Anyone who has been introduced to cyclotomic polynomials may notice that

$$\Psi_n=b^{\varphi(n)}\Phi\left(\frac ab\right).\qquad(2)$$

It follows that $\Psi$ inherits many properties of cyclotomic polynomials. The ones I will use can be found in Yimin Ge's paper for example. I won't use properties for $\Psi$ if it's not clear how they are inherited. (Cyclotomic polynomials are never explicitly mentioned in the paper, I guess they weren't so well known at the time.)

The following lemma will be useful more then once:

Lemma. If $p$ is prime, $n=p^\alpha q$ an integer such that $p\nmid q$ and $p\mid\Phi_n(a)$ for some integer $a$, then $O_p(a)=q$. (The $O$ is just a lazy notation for the multiplicative order.)

Proof. If $O_p(a)=k$ then $k\mid q$ because from $p\mid\Phi_n(a)$ we certainly have $p\mid a^n-1\equiv a^q-1$. If $k<q$ then there is a divisor $d\mid k$ for which $p\mid\Phi_d(a)$. As $d\mid q$ and $d<q$ this means the polynomial $x^q-1$ has a double root, $a$, modulo $p$, hence $p\mid q$ (see Lemma 6 from Yimin Ge) which is impossible. $\square$

The lemma can be modified to fit for $\Psi$ too, but I won't do that.

By a multiplicative variant on Möbius-inversion, we have $$z_n=\prod_{d\mid n}\Psi_d.\qquad(3)$$

If $z_n={p_1}^{a_1}\cdots{p_r}^{a_r}$ where $p_{s_1},\ldots,p_{s_t}$ are the primitive prime divisors of $z_n$, we set

$$P_n={p_{s_1}}^{a_{s_1}}\cdots{p_{s_t}}^{a_{s_t}}.$$

From $(2)$ we have $\Psi_n\in\mathbb Z$ and from $(1)$ it follows that $P_n\mid\Psi_n$. Set $\Psi_n=\lambda_nP_n$. We shall prove that $P_n>1$ in all cases Zsigmondy's theorem does not exclude.

From $(3)$ it follows that $\Psi_n\mid\frac{z_n}{z_d}$ for every divisor $d\mid n$ with $d<n$.

Let $p$ be a prime divisor of $\Psi_n$. If $p\nmid P_n$ then $p$ is not primitive, suppose $p\mid z_d$. If $p\nmid n$ then by the Lifting The Exponent Lemma, $v_p(z_n)=v_p(z_d)$ so $p\nmid\frac{z_n}{z_{d}}$, contradiction. Hence $\text{rad}(\lambda_n)\mid n$.

Furthermore, note that $\gcd(\lambda_n,P_n)=1$. Otherwise there would be a prime $p$ for which $p\mid n=pr$ and $p\mid a^{pr}-b^{pr}\equiv a^r-b^r$ contradicting the primitivity of $p$.

Suppose $\lambda_n>1$. If $p$ is a prime divisor of $\lambda_n$ with $p^\alpha\|n$ and $n=p^\alpha q$ then

$$p\mid\Psi_n\mid\Psi_q(a^{p^\alpha},b^{p^\alpha})\equiv\Psi_q\pmod p,$$

(see corollary 4 from Yimin Ge) where more generally we denote

$$\Psi_n(x,y)=y^{\varphi(n)}\Phi_n\left(\frac xy\right).$$

If $p$ is a prime divisor of $\lambda_n$, so $p\mid\Psi_q$, then either $p\mid q$, which is impossible, or $p\equiv1\pmod q$. (see Theorem 6 from Yimin Ge) So $p>q=\frac n{p^\alpha}$. If $r$ is another prime divisor of $n$, then $r\mid q$, so $r<q<p$. This means $p$ is uniquely determined as the largest prime divisor of $n$.

Therefore, set $\lambda_n=p^\beta$. We'll prove that $\beta=1$.

Let $d\mid n$ such that $p\mid z_d$. Let $c$ be an inverse of $b$ modulo $p$, then $p\mid\Psi_n$ and thus $p\mid\Phi(ac)$, so by the lemma, $O_p(ac)=q$. So certainly we should have $q\mid d$.

Now from $(1)$ we have $\beta=v_p(\Psi_n)=v_p(z_n)-v_p(z_{\frac np})=1$, as desired.

To conclude $P_n>1$ there are three cases:

If $\lambda_n=1$, then $P_n=\Psi_n\geq(a-b)^{\varphi(n)}\geq1$. The inequality is strict unless $n=2$ and $a-b=1$, but then Zsigmondy's theorem is trivially true.

If $\lambda_n=p$ and $a-b>1$, then $P_n=\frac1p\Psi_n\geq\frac1p(a-b)^{\varphi(n)}\geq\frac{2^{p-1}}p\geq1$. Again the inequality is strict unless $a-b=2$ and $n=2$. In that case, $a+b$ contains a primitive prime divisor unless $a+b$ is a power of $2$, which was predicted by the theorem.

The case $\lambda_n=p$ and $a-b=1$ is a little more complicated. (Here I basically prove a generalisation of L. Thompson's "Key Lemma".)

Suppose the inequality is not strict, so $\Psi_n=p$. This will eventually give us the only counterexample that's left, being $n=6$, $a=2$. From $p\mid z_n$ it follows that $p$ is odd. Let, once more, $n=p^\alpha q$ and $c$ an inverse of $b$ modulo $p$. As $p\mid\Psi_n$ we have $p\mid\Phi_n(ac)$, so $O_p(ac)=q$ by the lemma.

If $\alpha>1$, then $p=\Psi_n=\Psi_{pq}(a^{p^{\alpha-1}},b^{p^{\alpha-1}})$, but

$$\Psi_{pq}(a^{p^{\alpha-1}},b^{p^{\alpha-1}})\geq(a^{p^{\alpha-1}}-b^{p^{\alpha-1}})^{\varphi(pq)}\geq a^p-b^p=\sum_{k=0}^{p-1}{p\choose k}b^k>p,$$

because $p>2$, contradiction. Hence $n=pq$. Now we have

$$p=\Psi_n=\frac{\Psi_q(a^p,b^p)}{\Psi_q}\geq\frac{(a^p-b^p)^{\varphi(q)}}{(a+b)^{\varphi(q)}}\geq\frac{a^p-b^p}{a+b}=\frac1{2b+1}\sum_{k=0}^{p-1}{p\choose k}b^k>\frac b{2b+1}\sum_{k=1}^{p-1}{p\choose k}.$$

Since $\frac b{2b+1}\geq\frac13$ we obtain $3p>2^p-2$.

This is impossible when $p>3$, so $p=3$ and $q\mid 2$ because $q=O_p(ac)\mid p-1$.

Hence $n=3$ or $n=6$. If $n=3$ the theorem is obviously true because $3$ is prime and $a-b=1$.

The case $n=6$ remains, and indeed Zsigmondy fails here. From $a=b+1$ and $3=\Psi_6=a^2-ab+b^2$ we easily deduce that $a=2$ and $b=1$ is the only counterexample. $\square$


The proof for the case $a^n+b^n$ can be deduced from the case $a^n-b^n$.

For any positive integer $n>1$ for which $2n$ does not give an exception on Zsigmondy's theorem, $a^{2n}-b^{2n}$ has a primitive prime divisor $p$, dividing either $a^n-b^n$ or $a^n+b^n$. However, $p$ can't divide $a^n-b^n$ since then $p$ wouldn't be primitive. Thus we have $p\mid a^n+b^n$ and $p\nmid a^{2k}-b^{2k}$ for all $k<n$. This implies $p\nmid a^k+b^k$ for all $k<n$, hence the theorem. $\square$

Note that the exception $2^6-1^6$ is reflected in $2^3+1^3$. The case $n=2$ and $a+b$ a power of $2$ disappears because we only consider $n>1$ here.

Solution 2:

I have found the following published reference: Elementary proof of Zsigmondy's theorem by M. Teleuca, which also cleans up the proof of Birkhoff & Vandiver.

Artin also gives a proof in the general case (which seems to be discovered independently).