Why doesn't this definition of natural numbers hold up in axiomatic set theory?

"Too large" is an informal description of what goes wrong, and is not entirely on point for understanding how ZFC doesn't allow you to do this.

It would be more honest to describe the problem as

There is simply no axiom in ZFC that will let you conclude that the notation $\{ x \cup \{y\} \mid x\in a, y\notin x\}$ describes any set that exists.

Remember that ZFC doesn't support free-wheeling use of the set builder noation which assumes that $\{y\mid \phi(y) \}$ (where $\phi$ is some logical formula) always describes a set. Instead you have only separation which tells you that expressions of the form $\{y\in A\mid \phi(y)\}$ describe sets, and replacement which tells you that expressions of the form $\{F(y)\mid y\in A\}$ -- where $F$ is some function that you can define by a logical formula -- are sets.

However, $\{ x \cup \{y\} \mid x\in a, y\notin x\}$ doesn't have this form -- instead it would fit the scheme $\{ F(y) \mid \phi(y) \}$, and neither Separation nor Replacement promises to work for that situation.

(If you haven't seen the axioms of ZFC written down, it would probably help your understanding to seek out an explanation of them. In particular, what I describe as, for example, "$\{y\in A\mid \phi(y)\}$ exists" is formally described by saying that for any formula $\phi(y)$ that doesn't contain $x$, the formula $$ \exists x.\forall y.(y\in x \iff y\in A \land \phi(y)) $$ is an axiom).


The "too large to be a set" is at most a hint at an answer to a different question, namely

  • Why can't we just have some more axioms that say we can do this?

The answer to this is that we can actually prove that $\{x\cup \{y\}\mid x\in A, y\notin x\}$ does not exist (which is different from not being able to prove that it does) -- so if we had an axiom that claimed that it did exist, the system would become inconsistent.

In more detail, the proof might go: Suppose that for some set $A$, $$\sigma(A) = \{x\cup \{y\}\mid x\in A, y\notin x\}$$ exists. Then $\bigcup A \cup \bigcup\sigma(A)$ -- which must exist due to ZFCs explicit Axiom of Unions -- would be a set that contains all elements of $A$, as well as all elements of elements of $A$, as well as every set that is in neither of these groups. In other words, this would be a set of all sets, and then Russell's paradox would lead us to a contradiction.

Presenting only this second argument, without explaining (or stressing) the first one, is a common failing of semi-popular descriptions of set theory. It can easily give a reader the impression that something must be allowed unless we can see it leads to a paradox, which is most definitely not how axiomatic set theory works. Axiomatic set theory works by saying from the beginning, "these are the things that are allowed" and then hoping no combination of those things lead to a paradox.

The only real value of the "if you could do this, it would lead to a paradox" argument is that once you see it you can stop trying to figure out a way that it is allowed.


Notice that, under that definition, we have $$ \begin{split} 1 = \sigma(0) &= \sigma(\{\emptyset\}) = \{x \cup \{y\} : x \in \{\emptyset\} \land y \not \in x\} \\ &= \{ \emptyset \cup \{y\} : \text{y is a set}\}\\ & = \{ \{y\} : y \text{ is a set}\}\\ &= \{ z : |z| = 1\}. \end{split} $$

In ZFC, the collection of all sets ("$V$") does not form a set, so the definition breaks down already at stage $1$. If $\sigma(0)$ was a set then $V = \{y : \{y\} \in \sigma(0)\}$ would also be a set. So that is really the technical difficulty. Frege and Russell proposed that the number $1$ could be defined to consist of all $1$-element sets, but that collection of sets is not itself a set in ZFC.

The usual way of describing why $V$ is not a set is that it is "too large"; this sense of "largeness" is one of the more common ways of motivating the ZFC axioms, so the Wikipedia author alluded to it.

The idea of "largeness" is really an allusion to the "cumulative hierarchy" vision of set theory. Unfortunately, the cumulative hierarchy is hard to describe in one sentence, because it depends already on the notion of ordinal. But the idea is that we can form a collection of sets stage by stage, so that the powerset of each set is formed at the next stage after the set is formed, and so that all the members of each set are formed at stages strictly before the set itself is formed.

One way to understand the ZFC axioms is that they are only trying to describe the sets that are formed via this process. But $V$ cannot be formed at any stage, because it would have to already contain its powerset, but the powerset ought to be formed at the next stage. So the claim that $V$ is too "large" really means that $V$ could not be formed at any stage of the process.

Back to defining the numbers. We can imagine that two sets have the same cardinality if there is a bijection between them. This is an equivalence relation, so it ought to have equivalence classes. And the equivalence class of $\{\emptyset\}$ will consist of every set that has exactly one element. That is the idea behind the definition above. But these equivalences classes are not sets in the cumulative hierarchy, so ZFC has trouble with them.

The way that we usually circumvent this kind of problem in ZFC is to select a "particular" representative from each equivalence class. Then, instead of referring to the entire equivalence class, we refer just to that representative. The most commonly use set of representatives in ZFC are the von Neumann ordinals. So we have $$ \begin{split} 0 &= \emptyset\\ 1 &= \{\emptyset\} = \{0\}\\ 2 &= \{\emptyset, \{\emptyset\}\} = \{0,1\}\\ 3 &= \{0,1,2\} \end{split} $$ and so on. This is not really much different than the definition due to Frege and Russell, as you can see.