How to avoid perceived circularity when defining a formal language?
Suppose we want to define a first-order language to do set theory (so we can formalize mathematics). One such construction can be found here. What makes me uneasy about this definition is that words such as "set", "countable", "function", and "number" are used in somewhat non-trivial manners. For instance, behind the word "countable" rests an immense amount of mathematical knowledge: one needs the notion of a bijection, which requires functions and sets. One also needs the set of natural numbers (or something with equal cardinality), in order to say that countable sets have a bijection with the set of natural numbers.
Also, in set theory one uses the relation of belonging "$\in$". But relation seems to require the notion an ordered pair, which requires sets, whose properties are described using belonging...
I found the following in Kevin Klement's, lecture notes on mathematical logic (pages 2-3).
"You have to use logic to study logic. There’s no getting away from it. However, I’m not going to bother stating all the logical rules that are valid in the metalanguage, since I’d need to do that in the metametalanguage, and that would just get me started on an infinite regress. The rule of thumb is: if it’s OK in the object language, it’s OK in the metalanguage too."
So it seems that, if one proves a fact about the object language, then one can also use it in the metalanguage. In the case of set theory, one may not start out knowing what sets really are, but after one proves some fact about them (e.g., that there are uncountable sets) then one implicitly "adds" this fact also to the metalanguage.
This seems like cheating: one is using the object language to conduct proofs regarding the metalanguage, when it should strictly be the other way round.
To give an example of avoiding circularity, consider the definition of the integers. We can define a binary relation $R\subseteq(\mathbf{N}\times\mathbf{N})\times(\mathbf{N}\times\mathbf{N})$, where for any $a,b,c,d\in\mathbf{N}$, $((a,b),(c,d))\in R$ iff $a+d=b+c$, and then defining $\mathbf{Z}:= \{[(a,b)]:a,b\in\mathbf{N}\}$, where $[a,b]=\{x\in \mathbf{N}\times\mathbf{N}: xR(a,b)\}$, as in this question or here on Wikipedia. In this definition if set theory and natural numbers are assumed, then there is no circularity because one did not depend on the notion of "subtraction" in defining the integers.
So my question is:
Question Is the definition of first-order logic circular? If not, please explain why. If the definitions are circular, is there an alternative definition which avoids the circularity?
Some thoughts:
Perhaps there is the distinction between what sets are (anything that obeys the axioms) and how sets are expressed (using a formal language). In other words, the notion of a set may not be circular, but to talk of sets using a formal language requires the notion of a set in a metalanguage.
In foundational mathematics there also seems to be the idea of first defining something, and then coming back with better machinery to analyse that thing. For instance, one can define the natural numbers using the Peano axioms, then later come back to say that all structures satisfying the axioms are isomorphic. (I don't know any algebra, but that seems right.)
Maybe sets, functions, etc., are too basic? Is it possible to avoid these terms when defining a formal language?
I think an important answer is still not present so I am going to type it. This is somewhat standard knowledge in the field of foundations but is not always adequately described in lower level texts.
When we formalize the syntax of formal systems, we often talk about the set of formulas. But this is just a way of speaking; there is no ontological commitment to "sets" as in ZFC. What is really going on is an "inductive definition". To understand this you have to temporarily forget about ZFC and just think about strings that are written on paper.
The inductive definition of a "propositional formula" might say that the set of formulas is the smallest class of strings such that:
Every variable letter is a formula (presumably we have already defined a set of variable letters).
If $A$ is a formula, so is $\lnot (A)$. Note: this is a string with 3 more symbols than $A$.
If $A$ and $B$ are formulas, so is $(A \land B)$. Note this adds 3 more symbols to the ones in $A$ and $B$.
This definition can certainly be read as a definition in ZFC. But it can also be read in a different way. The definition can be used to generate a completely effective procedure that a human can carry out to tell whether an arbitrary string is a formula (a proof along these lines, which constructs a parsing procedure and proves its validity, is in Enderton's logic textbook).
In this way, we can understand inductive definitions in a completely effective way without any recourse to set theory. When someone says "Let $A$ be a formula" they mean to consider the situation in which I have in front of me a string written on a piece of paper, which my parsing algorithm says is a correct formula. I can perform that algorithm without any knowledge of "sets" or ZFC.
Another important example is "formal proofs". Again, I can treat these simply as strings to be manipulated, and I have a parsing algorithm that can tell whether a given string is a formal proof. The various syntactic metatheorems of first-order logic are also effective. For example the deduction theorem gives a direct algorithm to convert one sort of proof into another sort of proof. The algorithmic nature of these metatheorems is not always emphasized in lower-level texts - but for example it is very important in contexts like automated theorem proving.
So if you examine a logic textbook, you will see that all the syntactic aspects of basic first order logic are given by inductive definitions, and the algorithms given to manipulate them are completely effective. Authors usually do not dwell on this, both because it is completely standard and because they do not want to overwhelm the reader at first. So the convention is to write definitions "as if" they are definitions in set theory, and allow the readers who know what's going on to read the definitions as formal inductive definitions instead. When read as inductive definitions, these definitions would make sense even to the fringe of mathematicians who don't think that any infinite sets exist but who are willing to study algorithms that manipulate individual finite strings.
Here are two more examples of the syntactic algorithms implicit in certain theorems:
Gödel's incompleteness theorem actually gives an effective algorithm that can convert any PA-proof of Con(PA) into a PA-proof of $0=1$. So, under the assumption there is no proof of the latter kind, there is no proof of the former kind.
The method of forcing in ZFC actually gives an effective algorithm that can turn any proof of $0=1$ from the assumptions of ZFC and the continuum hypothesis into a proof of $0=1$ from ZFC alone. Again, this gives a relative consistency result.
Results like the previous two bullets are often called "finitary relative consistency proofs". Here "finitary" should be read to mean "providing an effective algorithm to manipulate strings of symbols".
This viewpoint helps explain where weak theories of arithmetic such as PRA enter into the study of foundations. Suppose we want to ask "what axioms are required to prove that the algorithms we have constructed will do what they are supposed to do?". It turns out that very weak theories of arithmetic are able to prove that these symbolic manipulations work correctly. PRA is a particular theory of arithmetic that is on one hand very weak (from the point of view of stronger theories like PA or ZFC) but at the same time is able to prove that (formalized versions of) the syntactic algorithms work correctly, and which is often used for this purpose.
It's only circular if you think we need a formalization of logic in order to reason mathematically at all. However, mathematicians reasoned mathematically for many centuries before formal logic was invented, so this assumption is obviously not true.
It's an empirical fact that mathematical reasoning existed independently of formal logic back then. I think it is reasonably self-evident, then, that it still exists without needing formal logic to prop it up. Formal logic is a mathematical model of the kind of reasoning mathematicians accept -- but the model is not the thing itself.
A small bit of circularity does creep in, because many modern mathematicians look to their knowledge of formal logic when they need to decide whether to accept an argument or not. But that's not enough to make the whole thing circular; there are enough non-equivalent formal logics (and possible foundations of mathematics) to choose between that the choice of which one to use to analyze arguments is still largely informed by which arguments one intuitively wants to accept in the first place, not the other way around.