How to define Set Theory and First Order Logic without circular reference? [duplicate]

I think an important answer is still not present so I am going to type it. This is somewhat standard knowledge in the field of foundations but is not always adequately described in lower level texts.

When we formalize the syntax of formal systems, we often talk about the set of formulas. But this is just a way of speaking; there is no ontological commitment to "sets" as in ZFC. What is really going on is an "inductive definition". To understand this you have to temporarily forget about ZFC and just think about strings that are written on paper.

The inductive definition of a "propositional formula" might say that the set of formulas is the smallest class of strings such that:

  • Every variable letter is a formula (presumably we have already defined a set of variable letters).

  • If $A$ is a formula, so is $\lnot (A)$. Note: this is a string with 3 more symbols than $A$.

  • If $A$ and $B$ are formulas, so is $(A \land B)$. Note this adds 3 more symbols to the ones in $A$ and $B$.

This definition can certainly be read as a definition in ZFC. But it can also be read in a different way. The definition can be used to generate a completely effective procedure that a human can carry out to tell whether an arbitrary string is a formula (a proof along these lines, which constructs a parsing procedure and proves its validity, is in Enderton's logic textbook).

In this way, we can understand inductive definitions in a completely effective way without any recourse to set theory. When someone says "Let $A$ be a formula" they mean to consider the situation in which I have in front of me a string written on a piece of paper, which my parsing algorithm says is a correct formula. I can perform that algorithm without any knowledge of "sets" or ZFC.

Another important example is "formal proofs". Again, I can treat these simply as strings to be manipulated, and I have a parsing algorithm that can tell whether a given string is a formal proof. The various syntactic metatheorems of first-order logic are also effective. For example the deduction theorem gives a direct algorithm to convert one sort of proof into another sort of proof. The algorithmic nature of these metatheorems is not always emphasized in lower-level texts - but for example it is very important in contexts like automated theorem proving.

So if you examine a logic textbook, you will see that all the syntactic aspects of basic first order logic are given by inductive definitions, and the algorithms given to manipulate them are completely effective. Authors usually do not dwell on this, both because it is completely standard and because they do not want to overwhelm the reader at first. So the convention is to write definitions "as if" they are definitions in set theory, and allow the readers who know what's going on to read the definitions as formal inductive definitions instead. When read as inductive definitions, these definitions would make sense even to the fringe of mathematicians who don't think that any infinite sets exist but who are willing to study algorithms that manipulate individual finite strings.

Here are two more examples of the syntactic algorithms implicit in certain theorems:

  • Gödel's incompleteness theorem actually gives an effective algorithm that can convert any PA-proof of Con(PA) into a PA-proof of $0=1$. So, under the assumption there is no proof of the latter kind, there is no proof of the former kind.

  • The method of forcing in ZFC actually gives an effective algorithm that can turn any proof of $0=1$ from the assumptions of ZFC and the continuum hypothesis into a proof of $0=1$ from ZFC alone. Again, this gives a relative consistency result.

Results like the previous two bullets are often called "finitary relative consistency proofs". Here "finitary" should be read to mean "providing an effective algorithm to manipulate strings of symbols".

This viewpoint helps explain where weak theories of arithmetic such as PRA enter into the study of foundations. Suppose we want to ask "what axioms are required to prove that the algorithms we have constructed will do what they are supposed to do?". It turns out that very weak theories of arithmetic are able to prove that these symbolic manipulations work correctly. PRA is a particular theory of arithmetic that is on one hand very weak (from the point of view of stronger theories like PA or ZFC) but at the same time is able to prove that (formalized versions of) the syntactic algorithms work correctly, and which is often used for this purpose.


It's only circular if you think we need a formalization of logic in order to reason mathematically at all. However, mathematicians reasoned mathematically for many centuries before formal logic was invented, so this assumption is obviously not true.

It's an empirical fact that mathematical reasoning existed independently of formal logic back then. I think it is reasonably self-evident, then, that it still exists without needing formal logic to prop it up. Formal logic is a mathematical model of the kind of reasoning mathematicians accept -- but the model is not the thing itself.

A small bit of circularity does creep in, because many modern mathematicians look to their knowledge of formal logic when they need to decide whether to accept an argument or not. But that's not enough to make the whole thing circular; there are enough non-equivalent formal logics (and possible foundations of mathematics) to choose between that the choice of which one to use to analyze arguments is still largely informed by which arguments one intuitively wants to accept in the first place, not the other way around.


It's less like a circle and more like a spiral.

You use logic #1 to define and study, say, set theory #2. You use set theory #2 to develop a theory of formal logic #3. Formal logic #3 can be used to define set theory #4. And so forth.

Logic #5 and logic #3 are not the same thing. There are strong similarities, of course, and this fact can be expressed, e.g., by set theory #2. To some extent, if you're "within" the spiral, it doesn't really matter how far along you are, things will still "look" the same.

We then invoke a meta-mathematical assumption: that "real world" mathematics is described by some point on the spiral.

Recognizing the spiral isn't just for philosophical issues. If we're using set theory #2, then the actual construction and main applications of formal logic are those of formal logic #3. It is, for example, formal logic #3's notion of "the theory of a group" -- not formal logic #1's notion -- whose models (in set theory #2) are precisely the groups (in set theory #2). Skolem's paradox is something that can happen when you neglect the spiral: it arises, for example, when you fail to distinguish between the word "countable" from set theory #2 and the word "countable" from set theory #4.

But if you are interested in philosophical, foundational issues, the IMO winding twice around the spiral is the right way to go, I think. e.g. maybe you start with meta-logic #1, from which you build an "ambient" set theory #2. Set theory #2 is used to construct "ambient" logic #3, which can discuss set theory #4. Now, you do the rest of mathematics within set theory #4, except for a few special occasions, such as when you need set theory #2 to express the similarity between logic #3 and logic #5 so that you can take statements about what logic #5 says about set theory #6 and try to infer something about set theory #4.

In this way, we've insulated ourselves somewhat from dealing with the "real world", and are working entirely within the 'mathematical universe'. But it does require the discipline to be content working in set theory #4 and not inquire too strongly about set theory #2 or meta-logic #1.


Basically, this amounts to not being able formalize or prove anything without some sort of beginning. Since you can not formalize something without some foundation, the thing you begin with must necessarily be somewhat informal.

In order to start doing anything, you are working in the metatheory. Everything in the metatheory is finite in the sense of our reality. In some sense you can understand the metatheory as an intuitive, finitistic mathematics.

In the metatheory, we can define a formal language. It consist of a collection of symbols, a collection of rules for forming the well-formed formulas, and collection of rules of inferences. Note that everything is done in the metatheory. I can not asserting that the collection themselves provably exists since proving it require a formal system. However, you can describe the collection finitely in reality. Whether the collection exists or not in the metatheory is probably a question of philosophy.

For example, the first order theory of ZFC in the language of sets would be described as follows in the metatheory. The language consist of a single symbol $\in$. The axioms of are the axioms of ZF. Note that the axiom are finitely described in the metatheory. For example, you must have read the axioms in some textbook and I am sure that textbook was finite.

Now that the first order theory ZFC has been defined the metatheory, you can now work in this formal system. For instance, now you can formalize first order theories in ZFC. You can even prove that the axioms of ZFC is a set. You can now use the term "set" because you are actually working in $ZFC$.

However note that when you prove anything using ZFC, the proof is finite and hence can be carried out in the metatheory. So even if you are proving the independence of the continuum hypothesis about whether $2^{\aleph_0} = \aleph_1$, the proof is still done in finite math and carried in the metatheory. Even though this result is about very infinite cardinals, the proof is finite.