Ordinals - motivation and rigor at the same time

Well.

To understand the motivation behind ordinals first you need to understand what are well-orders and why they are useful. Luckily, that's not hard.

Well-orders are sets where every element has a unique "next" element, and we can use proofs by [transfinite] induction. These are useful since induction is useful to describe a step-by-step construction, well-orders can be used to argue step-by-step but in a way that allows us to go to infinity and beyond.

Well-ordered sets are very nice. Given two well-ordered sets, one of them is isomorphic to an initial segment of the other, and the isomorphism is unique, so the initial segment is unique as well. This is of course not true when you consider just linear orders, for example $[0,1]$ and $[0,1)$ are isomorphic to initial segments of one another, but there are plenty of such initial segments.

So now that we understood the motivation for well-ordered sets, we should probably observe that there is a proper class of isomorphic well-ordered sets in any given order type (read: isomorphism class) with the exception of the empty set, since there's only one of those.

In other words, the collection of all well-ordered sets with exactly one point is not a set. So if we want to prove something about "all well-ordered sets", which in turn means for all possible ways to do something step-by-step, we need to resort to various tricks and all sort of annoying meta0mathematical difficulties.

But as luck would have it, the von Neumann ordinals gives us an out. They allow us to pick one set which lies in every equivalence class, with a distinguished well-order of course. And since the language of set theory has only one binary relation symbol, the easiest thing to do is to make that well-order be the $\in$ relation on that set.

So an ordinal is a transitive set, such that $\in$ is a well-order on that set. Now we can show that every well-ordered set is isomorphic to a unique von Neumann ordinal, and that the isomorphism is unique. We can show that every two ordinals are comparable, which means that one of them is an element of the other (and therefore a subset, since those are transitive sets).

And more importantly, if we proved something holds for all ordinals, and that thing is a property depending only on the order, then we effectively proved it for every well-ordered set.


Starting from the most basic intuition, you use ordinals when you talk about things that are in a certain order: first, second...
This particular order can be represented by a never ending chain with a starting point: for every ring of the chain, there is a next ring that didn't occur before (the chain never loops, goes on forever, and never splits).
So, we could want to find something similar in a set theory, since orders are important for a lot of reasons, the most basic being to be able to say when one ordered set is smaller than another ordered set sharing the way in which it orders its elements. (If we don't care about orders, we can just use cardinality to compare sets)

In ZFC, we do this: we take a starting point as an ordinal, and decide on a way to generate a "next" ordinal from any ordinal you have.
One way is to take the empty set as a starting point, and define the ordinal that comes after x as xU{x}.
In this way, you have this:
{}
{{}}
{{}, {{}}}
{{}, {{}}, {{},{{}}}}
...
To visualize better, let's name each ordinal after a natural number. We obtain this:
{}=0
{0}=1
{0,1}=2
{0,1,2}=3
...
So, with this method, each ordinal ends up being the set of the ordinals that precede it, if we take the membership relation as the order relation (taking the subset relation works too)

Taking a step further, we could want to expand our horizons and think of an ordinal as anything containing everything that precedes it.
So, let's consider a set containing every ordinal we obtain starting from {} and finding the next one:
ω={0,1,2,...}
There's no problem in considering ω an ordinal too: if we do, it certainly contains every ordinal that precedes it.
In finding ω, we took a 'transfinite step' from the 'finite' ordinals. Now, we can repeat the process again, by taking ω as a new starting point: ω+1=ωU{ω}={0,1,2...,ω},
ω+2={0,1,2...,ω,ω+1},
...
And then we can make, again, a transfinite step and go on forever from one level of ordinals to the next.

We can't, however, have a set of all ordinals, as that leads to Burali-Forti's paradox (very roughly, if there is a set of all ordinals, it would be the last ordinal, but if would also have a following ordinal as any ordinal does, so it can't be the last one).