I had similar conceptual trouble until I came across Tim Gowers' blog post on How to use Zorn's lemma. Its main thesis:

If you are building a mathematical object in stages and find that (i) you have not finished even after infinitely many stages, and (ii) there seems to be nothing to stop you continuing to build, then Zorn’s lemma may well be able to help you.

He shows how to use the Lemma in a number of cases where my intuitive approach would have been something like "ah, but we can construct the thing by transfinite induction ... let's find a sufficiently large ordinal to induct over (work, work, work) ... and then fix a choice function such that we can make choices at each step along the way (work, work, work) ... and if the thing is still not made when we reach the top of our chosen ordinal, it would be (work, work) a contradiction".

Compared to that, Zorn's Lemma packs a lot of boilerplate argument into a simple, tidy, reuseable tool where one just needs to specify the minimal properties of the situation for the construction to work. In particular the apparently ill-motivated condition about chains is exactly what is needed for the tedious transfinite-induction argument to keep rolling when we hit a limit ordinal.


This blog post by Tim Gowers has a nice discussion.

Suppose you have some process of choosing elements from a partially ordered set, and nothing seems to stop you from choosing bigger and bigger elements, no matter how many elements you already have. Zorn's Lemma says that the process must stop.

For example, to prove that every vector space has a basis: Start with the empty set, and keep on adding elements that are linearly independent of the existing elements. No matter how big the set gets, if it's not already a basis, then you can add another element to it, even after infinitely many steps: If $X$ is a linearly independent subset of a vector space $V$, then either $X$ is a basis, or else (no matter how big $X$ is), there exists some $x \in V$ which is independent of $X$, and you can add it to $X$ to get a bigger linearly independent set. Zorn's Lemma says this process must terminate, at which point you have a basis.

In fact, this intuitive description can be turned into a proof of Zorn's Lemma: Let $X$ be a nonempty partially ordered set in which every totally ordered subset has an upper bound. Suppose $X$ has no maximal element. Choose $x_1 \in X$. It's not maximal, so I can find $x_2 > x_1$. But $x_2$ is not maximal, so I can find $x_3 > x_2$, etc., leading to $x_1 < x_2 < x_3 < x_4 < \ldots$ These $x_i$'s form a totally ordered subset, hence there is an element $x_\omega$ dominating all of them. But $x_\omega$ is not maximal, so there is $x_{\omega+1}$, etc. Nothing stops me from obtaining an $x_\alpha \in X$ for every ordinal $\alpha$. But the size of every set is bounded by a fixed ordinal (see the Burali-Forti paradox), contradiction.


Let $F$ be a finite set and $\leq$ a partial order on $F$. Then there is a $\leq$-maximal element in $F$.

Proof: Suppose not. Then for every $x\in F$ there is $x'\in F$ with $x'>x$. So you can build a sequence $(x_1,x_2,x_3,\ldots)$ with $x_{n+1}>x_n$ for all $n$ and all terms are distinct, contradicting the finiteness of $F$.

Now this proof obviously doesn't work for infinite sets and there are infinite partially ordered sets without a maximal element. But let's make the assumption of Zorn's lemma that every chain has an upper bound and see what we can do with it.

Suppose we have created the sequence $(x_1,x_2,\ldots)$ in an infinite partially ordered set $(X,\leq)$ satisfying the conditon of Zorn's lemma but without a maximal element (we use the Axiom of choice to choose larger and larger elements). Such a sequence is consistent. But the set $\{x_1,x_2,\ldots\}$ is a chain, so there is an element larger than every element in this sequence. Now we employ heavy set theoretic machinery, the theory of ordinals to create a transfinite sequence of the form $(x_1,x_2,\ldots,x_\omega)$.Since there is no maximal element we can construct a larger transfinite sequence $(x_1,x_2,\ldots,x_\omega,x_{\omega+1})$. And we can proceed this way to get larger and larger transfinite sequences, and all terms are always distinct. Eventually, we get a transfinite sequence with more distinct terms than elements in $X$, which gives us a contradiction.

The axiom of choice is used for selecting for each element a larger one. In the finite case, this selection exists by induction alone.

This can be made into a rigorous proof. An accessible book containing this proof is for example "Introduction to Set Theory" by Jech and Hrbacek.