Definition of Ordinals in Set Theory in Layman Terms

Counting has two purposes, namely for specifying sizes and indices. These are directly related for finite quantities, because the number of natural numbers (including $0$) less than $n$ (before the position $n$) is $n$. But in set theory, when generalizing to infinite sets these two notions become different. $\def\nn{\mathbb{N}}$ $\def\zz{\mathbb{Z}}$ $\def\qq{\mathbb{Q}}$ $\def\rr{\mathbb{R}}$ $\def\eq{\leftrightarrow}$ $\def\less{\smallsetminus}$ $\def\none{\varnothing}$

Sizes

The notion of size is extended to infinite sets in the following way. Take any two sets $X,Y$. We say that $X$ is no larger than $Y$ if we can label each element of $X$ with a unique element of $Y$, in the sense that no two elements of $X$ are given the same label. Mathematically this labelling is called an injection from $X$ into $Y$, and we write $\def\inj{\hookrightarrow}$$X \inj Y$. We say that $X$ is of the same size as $Y$ iff there is a bijection (1-to-1 correspondence) between elements of $X$ and elements of $Y$, and we write $X \approx Y$. It turns out that if $X \inj Y$ and $Y \inj X$ then $X \approx Y$.

Notice that $\nn_{>0} \inj \nn$ via the identity labelling, and $\nn \inj \nn_{>0}$ by labelling each element $n$ of $\nn$ by $n+1$. So $\nn_{>0} \approx \nn$, meaning that they cannot be distinguished in terms of size as defined above. There is no other elegant way to define size comparison to distinguish them, because we want it to be independent of labelling, meaning that we want $X \approx \{ f(x) : x \in X \}$ for any set $X$ and injective function $f$ on $X$.

Indices

The notion of indexing, on the other hand, can be extended in the following way. Indexing is used for a sequence, where order is important. Note that a sequence is nothing more than a function on its indices, where the indices are totally ordered. But if we want to define a sequence recursively by defining each element in terms of all the previous ones, then it is in general not enough for indices to be ordered. Specifically, a recursive definition on $I$ is a 'function' $E$ such that, for every $i$ in $I$ and function $f$ on $I_{<i}$, $E(f)$ is a function on $I_{\le i}$, and we want to build a function $f$ on $I$ such that $f \restriction I_{\le i} = E(f \restriction I_{<i})$ for any $i \in I$ ("$f \restriction D$" denotes "$f$ restricted to the domain $D$"). We cannot always do this if $I$ has a strictly decreasing infinite sequence. For a concrete example, there is no function $f$ on $\zz$ such that $f(r) = \cases{0 & if $\exists s \in \zz_{<r}\ ( f(s)=1 )$ \\ 1 & otherwise}$, despite it looking like a recursive definition. (Exercise: Prove that no such $f$ exists!) Notice that this counter-example can be easily adapted to any $I$ that has some strictly decreasing infinite sequence.

It turns out that, if $I$ has no strictly decreasing infinite sequence, then every recursive definition on $I$ not only will have some sequence satisfying it (in the above sense) but also that sequence is in fact unique! We say that a total order is well-ordered iff if every non-empty set of elements in it has a minimum. Now given any such $I$ and any set $S$ of elements in $I$, if $S$ has no minimum then we can iteratively pick a (countable) strictly decreasing sequence of elements in $S$ [by the axiom of dependent choice]. Therefore $I$ is well-ordered. Now take any recursive definition $E$ on $I$. If $E$ does not 'work', namely it does not uniquely define a sequence on $I$, then let $S$ be the set of all elements $i$ in $I$ such that there is no unique function $f$ on $I_{\le i}$ such that $f$ agrees $E(f \restriction I_{<i})$, and let $m$ be the minimum of $S$ in $I$. Then for every $i \in I_{<m}$ there is a unique function on $I_{\le i}$ satisfying $E$, and all these functions agree, and hence combining them gives a unique function on $I_{<m}$ that satisfies $E$. By definition of $E$ there is a unique function on $I_{\le m}$ that satisfies $E$, contradicting the definition of $m$. Therefore $E$ does 'work', namely there is a unique sequence on $I$ that satisfies $E$.

[Note: If you do not want to use the axiom of dependent choice, then you can still have recursive definitions for well-orders.]

The above is why we use well-orders for indexing, since every well-order has no strictly decreasing infinite sequence. [Technically $E$ need not be a function in ZFC set theory; it can be any definable function.] Also, the above proof technique generalizes to transfinite induction, namely that for any well-order $I$ and statement $Q$ we have $\forall i \in I\ ( \forall j \in I_{<i}\ ( Q(j) ) \to Q(i) ) \to \forall i \in I\ ( Q(i) )$.

Ordinals

Given any two total orders $X,Y$, we say that $X$ embeds into $Y$ if there is an injection $φ$ from $X$ into $Y$ that preserves the ordering, meaning that $a < b \eq φ(a) < φ(b)$ for every $a,b$ in $X$. Also, we say that $X$ is isomorphic to $Y$ if there is a bijection from $X$ to $Y$ that preserves the ordering. It is not too hard to prove that, for any two non-isomorphic well-orders, one of them embeds into the other, and not the other way around. Therefore well-orders are themselves totally ordered up to isomorphism, meaning that they satisfy the conditions for a total order except that equality is replaced by isomorphism.

Now the next natural step is to find a canonical form for well-orders, which we shall call ordinals. Take any well-order $X$. Recursively define the sequence $f$ on $X$ by $f(i) = \{ f(j) : j \in X_{<i} \}$ for each $i \in X$. Then its range $\{ f(i) : i \in X \}$ ordered under set membership $\in$ is isomorphic to $X$! (Exercise: Prove this by using transfinite induction to prove simultaneously that $\{ f(j) : j \in X_{\le i} \}$ and $f(i)$ are well-ordered under $\in$ for every $i$ in $X$.) We use this well-ordering as the (canonical) ordinal for $X$, which we shall denote by $ord(X)$. We will also order ordinals by $\in$ unless otherwise stated.

This also can motivate the von Neumann definition of ordinals as sets that are transitive (every element is a subset) and well-ordered under $\in$. It turns out that every (canonical) ordinal is also a von Neumann ordinal, and every von Neumann ordinal is its own (canonical) ordinal. (Exercise: Prove this. For the first part, first prove by transfinite induction that $f(i)$ is a von Neumann ordinal for every $i$ in $X$.)

Note that no ordinal is an element of itself, otherwise it will have a strictly decreasing infinite sequence. Consequently, there cannot be a set of ordinals $ORD$ in most set theories, because $ORD$ would be a von Neumann ordinal and hence $ORD \in ORD$.

Cardinals

Now we go back to the notion of size. It turns out that we can use some of the ordinals to represent the sizes of infinite sets. Take any set $X$. Let $W$ be the set of all well-orders with elements in $X$, and let $A = \{ ord(w) : w \in W \}$. Then we can prove that $A$ is a von Neumann ordinal. Also there cannot be an injection $f$ from $A$ into $X$, otherwise we can define a well-order $<$ on $S = \{ f(a) : a \in A \}$ by $f(a) < f(b) \eq a \in b$, and then $A = ord(S,<) \in A$.

Given the full axiom of choice, we can prove that $X \approx A_{<i}$ for some $i \in A$ as follows. Recursively define $f(i)$ to be some element in $X \less \{ f(j) : j \in A_{<i} \}$ if there is one and $\none$ otherwise, for each $i \in A$. By definition of $A$, $f$ is not an injection from $A$ into $X$, and hence there is a least $i \in A$ such that $X \less \{ f(j) : j \in A_{<i} \}$ is empty, which implies that $f \restriction A_{<i}$ is an bijection between $A_{<i}$ and $X$. By the well-ordering of ordinals we can define $\#(X)$, the cardinality of $X$, to be the least ordinal such that there is a bijection between $X$ and it. [Without the axiom of choice, we cannot define $f$ and so this definition of cardinality fails, but by the well-ordering of ordinals there is a least ordinal $H$, called a Hartogs number, such that there is no injection from $H$ into $X$, which is also a sort of measure of size.]

The size condition allows us to define recursive sequences on $X$ by ordering its elements according to this well-order. The minimality condition ensures that the partial sequence that we have to extend in the recursive definition is strictly smaller than $X$, which condition may be needed by the recursive definition itself.

For instance we can prove that $\#(S^2) = \#(S)$ for any infinite set $S$ by transfinite induction. Let $k = \#(S)$. Then it suffices to prove that $\#(k^2) = k$. Order the elements of $k^2$ by maximum and then lexicographic order, which can be easily shown to be a well-order. Let $m$ be the ordinal for this well-order, and let $f$ be the isomorphism from $m$ to $k^2$. If $k < m$, then let $a,b \in k$ such that $f(k) = (a,b)$, and let $d = \#(\max(a,b)) \le \max(a,b) < k$. Then $k$ is isomorphic to $\{ f(i) : i \in k \}$ $\subseteq \{ (x,y) : x,y \in k \land x,y \le d \}$ $\approx \#((d⊕1)^2)$ $= \#(d^2⊕d⊕d⊕1)$ $= \#(d⊕d⊕d⊕1) < k$, where "$p⊕q$" denotes a disjoint union of $p$ and $q$, namely $\{ (0,x) : x \in p \} \cup \{ (1,x) : x \in q \}$. The last inequality is because $k$ is infinite, and either $d$ is finite and so $d⊕d⊕d⊕1$ is also finite, or $d$ is infinite and so $\#(d⊕d⊕d⊕1) = d$. Thus $\#(k) < k$, contradicting the definition of $k$. Therefore $k \ge m \ge \#(k^2) \ge \#(k) = k$, and hence $\#(k^2) = k$.

(Exercise: Prove that there is a set $S$ of points in the plane such that every line passes through exactly two points in $S$. Sketch: Let $c = \#(\rr)$. Then there are $\#(\rr^2⊕\rr) = c$ many lines in the plane. We start with $S = \none$ and iterate through the lines indexed by $c$. We maintain the invariance that at each step $i \in c$ there are at most $2i$ many points added so far, which means that there are at most $(2i)^2$ points on the current line $L$ that cannot be added to $S$ without violating the desired property. But $(2i)^2+2 < c$ and $L$ passes through $c$ many points, so we can add (up to two) points to $S$ such that $L$ passes through exactly two points without violating the desired property.)


To understand ordinals you first need to understand what is a well-order.

Well-orders are a type of strict partial orders which satisfy the following axiom: Every non-empty set has a minimum. This implies that well-orders are linear orders, and that they look kinda like the natural numbers. The most important property of well-orders is that we can use them for inductive constructions and proofs, much like the natural numbers.

Now, when we want to do something by induction and use some well-ordered set, the specific set does not matter. What matters is how does the set "looks like". In mathematical terms we care about its order type, rather than its actual elements. Two partially ordered sets have the same order type if there is an order isomorphism between them: a bijection between the two sets which preserves the order attached to each set.

Let's consider a quick example: $\Bbb N$ and $\Bbb N\cup\{-42\}$, both ordered by the usual order of the integers, are isomorphic, they have the same order type. The isomorphism can be even given explicitly, $$f(x)=\begin{cases}0 & x=-42\\ x+1 & x\in\Bbb N\end{cases}$$ I will let you check and see that this is an isomorphism.

Two important theorems about well-ordered sets are these:

  1. If $A$ and $B$ are two well-ordered sets, and there is an order isomorphism from $A$ into some $B'\subseteq B$ and from $B$ into some $A'\subseteq A$, then in fact there is an isomorphism between $A$ and $B$. Moreover, this isomorphism is unique.

This is a non-trivial fact. Take a look at $\Bbb Q$ and $\Bbb Q\cap[0,1]$. We can embed each into the other, but they are not isomorphic: one of them has a minimum and maximum and the other does not. And even if we consider a simpler example, $\Bbb Z$ and $\Bbb Z$ itself, there are many ways to move the elements around while preserving the order.

  1. If $A$ and $B$ are two well-ordered sets, then either $A$ is isomorphic to an initial segment of $B$ or $B$ is isomorphic to an initial segment of $A$. (And this isomorphism is unique, as a consequence of the previous theorem.)

Again, this is nontrivial. Consider $\Bbb Z$ and $\Bbb Q$. Neither one is isomorphic to the other, and neither one is isomorphic to an initial segment of the other.

So what do these two theorems tell us? They us that well-orders are very robust, they have great comparability properties, and so if we only consider equivalence classes of well-orders under isomorphism, this induces a natural ordering between the classes, which itself happens to be a well-ordering. That's great!

But now we have a mild problem. Other than the empty order, there is a proper class of well-ordered sets from each order type. Just note that for the order type of "one element", every singleton can be a candidate for the well-ordered set, and there is a proper class of those. This is not a huge problem, but wouldn't it be nice if we could replace these proper classes by sets?

At some point in the early years of set theory, von Neumann suggested that instead of thinking in terms of abstract well-ordering, since they are so nice, we pick canonical representatives from each isomorphism class. And he proved that you can find a unique set which is transitive and well-ordered by $\in$ inside each class.

(Here we say that $A$ is a transitive set if whenever $a\in A$, it follows that $a\subseteq A$.)

And this is the von Neumann ordinal assignment. It picks from each order type, the unique transitive set which is well-ordered by $\in$. Then, a magical thing happens, it turns out that $0=\varnothing$, and the successor of $\alpha$ is $\alpha\cup\{\alpha\}$, and that if $\{\alpha_i\mid i\in I\}$ were defined, and $I$ is a well-ordering without a maximal element, then $\bigcup\{\alpha_i\mid i\in I\}$ is the limit of the $\alpha_i$'s.

And that gives us a very robust structure for the von Neumann ordinals: $\alpha=\{\beta\mid\beta<\alpha\}$, where $<$ here indicates "isomorphic to a proper initial segment".

Let me digress now, and since you brought $\aleph_0$ into the discussion, talk about cardinals for a moment. Cardinals are also isomorphism classes, like order type, only here we use just plain ol' bijections without requiring they preserve any structure. In the days of yore cardinals were also proper classes (except that of the empty set). We would say that $A$ has cardinality $\aleph_0$ if it has a bijection with the natural numbers, for example.

As set theory grew, we realized that if $A$ can be well-ordered, then there is a smallest order type of such a well-ordering (because the order types of well-ordered sets have a well-ordering themselves). This meant that we can assign to a set that can be well-ordered the least ordinal that can be put in bijection with that set. And after you have the von Neumann ordinals, this points at a specific set, too. Of course, assuming the axiom of choice every set can be well-ordered and that's fine, but not assuming choice, we need to find a better solution, and I won't discuss it here.

We have since come to confuse cardinals with ordinals on a regular basis. This is fine if all parties involved are well-aware of the differences and the implicit contexts and types. But I would generally advise against it (and I try my best to discern in my work between the cardinal and the ordinal, whenever possible). The two systems are equipped with addition, multiplication and exponentiation. And they often disagree. $\aleph_0+\aleph_0=\aleph_0$ as cardinal addition, but $\omega+\omega\neq\omega$ as ordinal addition.

So talking about $\aleph_0$ as if it was an ordinal is technically not a mistake, but it can lead others to mistaken conclusions (I recall several occasions that I had struggled with such notation; and historically we can find examples of people who didn't understand implicit contexts and made mistakes.)

Now, you asked whether or not ordinals denote "first" and "second" and "$\alpha$th". And the answer is yes. An ordinal is the answer to "how long is the queue to the bathroom". Cardinals, however, are the question "how many people are in the queue to the bathroom". As long as the queue is finite, the answers are the same. But if the queue has order type $\omega$, then every person has to wait only a finite time before using the facilities; but if you join at the end of the queue, we didn't increase the "how many" answer, but you clearly have to wait an infinite time before you can get to the bathroom.

So using $\aleph_0$ to denote a location in the queue is not ideal, as I remarked. Sure, you can insist that $\aleph_0$ and $\omega$ are both designated by the same set, therefore you can talk about the $\aleph_0$th ordinal being $\omega$. But it is much better to separate in your mind the two notions of cardinals and ordinals, at least until you have a better grasp of the two.