When does the set enter set theory?

Solution 1:

(1) This is actually not a problem in the form you have stated it -- the rules of what is a valid proof in first-order logic can be stated without any reference to sets, such as by speaking purely about operations on concrete strings of symbols, or by arithmetization with Gödel numbers.

However, if you want to do model theory on your first-order theory you need sets. And even if you take the syntactical viewpoint and say that it is all just strings, that just pushes the fundamental problem down a level, because how can we then formalize reasoning about natural numbers (or symbol strings) if first-order logic itself "depends on" natural numbers (or symbol strings)?

The answer to that is that is just how it is -- the formalization of first-order logic is not really the ultimate basis for all of mathematics, but a mathematical model of mathematical reasoning itself. The model is not the thing, and mathematical reasoning is ultimately not really a formal theory, but something we do because we intuitively believe that it works.

(2) This is a misunderstanding. In axiomatic set theory, the axioms themselves are the definition of the notion of a set: A set is whatever behaves like the axioms say sets behave.

(3) What you quote is how functions usually are modeled in set theory. Again, the model is not the thing, and just because we can create a model of our abstract concept of functional relation in set theory, it doesn't mean that our abstract concept an sich is necessarily a creature of set theory. Logic has its own way of modeling functional relations, namely by writing down syntactic rules for how they must behave -- this is less expressive but sufficient for logic's need, and is no less valid as a model of functional relations than the set-theoretic model is.

Solution 2:

For those concerned with secure foundations for mathematics, the problem is basically this: In order to even state the axioms of ZF set theory, we need to have developed first-order logic; but in order to recognize conditions under which first-order logical formulas can be regarded as "true" or "valid", we need to give a model-theoretic semantics for first-order logic, and this in turn requires certain notions from set theory. At first glance, this seems troublingly circular.

Obviously, this is an extremely subtle area. But as someone who has occasionally been troubled by these kinds of questions, let me offer a few personal thoughts, for whatever they may be worth.

My own view goes something like this. Let's first distinguish carefully between logic and metalogic.

In logic, which we regard as a pre-mathematical discipline, all we can do is give grammatical rules for constructing well-formed statements and derivation rules for constructing "proofs" (or perhaps, so as not to beg the question, we should call them "persuasive demonstrations"); these should be based purely on the grammatical forms of the various statements involved. There is, at this point, no technical or mathematical theory of how to assign meanings to these formulas; instead, we regard them merely as symbolic abbreviations for the underlying natural language utterances, with whatever informal psychological interpretations these already have attached to them. We "justify" the derivation rules informally by persuading ourselves through illustrative examples and informal semantical arguments that they embody a reasonable facsimile of some naturally occurring reasoning patterns we find in common use. What we don't do, at this level, is attempt to specify any formal semantics. So we can't yet prove or even make sense of the classical results from metalogic such as the soundness and/or completeness of a particular system of derivation rules. For us, "soundness" means only that each derivation rule represents a commonly accepted and intuitively legitimate pattern of linguistic inference; "completeness" means basically nothing. We think of first-order logic as merely a transcription into symbolic notation of certain relatively uncontroversial aspects of our pre-critical reasoning habits. We insist on this symbolic representation for two reasons: (i) regimentation, precisely because it severely limits expressive creativity, is a boon to precision; and (ii) without such a symbolic representation, there would be no hope of eventually formalizing anything.

Now we take our first step into true formalization by setting down criteria (in a natural language) for what should constitute formal expression in the first place. This involves taking as primitives (i) the notion of a token (i.e., an abstract stand-in for a "symbol"), (ii) a binary relation of distinctness between tokens, (iii) the notion of a string, and (iv) a ternary relation of "comprisal" between strings, subject to certain axioms. The axioms governing these primitives would go something like this. First, each token is a string; these strings are not comprised of any other strings. By definition, two strings that are both tokens are distinct if the two tokens are distinct, and any string that is a token is distinct from any string that is not a token. Secondly, for any token t and any string S, there is a unique string U comprised of t and S (we can denote this U by 'tS'; this is merely a notation for the string U, just as 't' is merely a notation for the abstract token involved --- it should not be thought of as being the string itself). By definition, tS and t'S' are distinct strings if either t and t' are distinct tokens or S and S' are distinct strings (this is intuitively a legitimate recursive definition). Thirdly, nothing is a string except by virtue of the first two axioms. We can define the concatenation UV of strings U and V recursively: if U is a token t, define UV as the string tV; otherwise, U must be t for some token t and string S, and we define UV as the string tW, where W is the (already defined) concatenation SV. This operation can be proven associative by induction. All this is part of intuitive, rather than formal, mathematics. There is simply no way to get mathematics off the ground without understanding that some portion of it is just part of the linguistic activity we routinely take part in as communicative animals. This said, it is worth the effort to formulate things so that this part of it becomes as small as possible.

Toward that end, we can insist that any further language used in mathematics be formal (or at least capable of formal representation). This means requiring that some distinguishable symbol be set aside and agreed upon for each distinct token we propose to take for granted, and that everything asserted of the tokens can ultimately be formulated in terms of the notions of distinctness, strings, comprisal, and concatenation discussed above. Notice that there is no requirement to have a finite set of tokens, nor do we have recourse within a formal language to any notion of a set or a number at all.

Coming back to our symbolic logic, we can now prove as a theorem of intuitive (informal) mathematics that, with the symbolic abbreviations (the connectives, quantifiers, etc.) mapped onto distinct tokens, we can formulate the grammatical and derivational rules of logic strictly in terms of strings and the associated machinery of formal language theory. This brings one foot of symbolic logic within the realm of formal mathematics, giving us a language in which to express further developments; but one foot must remain outside the formal world, connecting our symbolic language to the intuitions that gave birth to it. The point is that we have an excellent, though completely extra-mathematical, reason for using this particular formal language and its associated deduction system: namely, we believe it captures as accurately as possible the intuitive notions of correct reasoning that were originally only written in shorthand. This is the best we can hope to do; there must be a leap of faith at some point.

With our technically uninterpreted symbolic/notational view of logic agreed upon (and not everyone comes along even this far, obviously -- the intuitionists, for example), we can proceed to formulate the axioms of ZF set theory in the usual way. This adds sets and membership to the primitive notions already taken for granted (tokens, strings, etc.). The motivating considerations for each new axiom, for instance the desire to avoid Russell's paradox, make sense within the intuitive understanding of first-order logic, and these considerations are at any rate never considered part of the formal development of set theory, on any approach to foundational questions.

Once we have basic set theory, we can go back and complete our picture of formal reasoning by defining model-theoretic semantics for first-order logic (or indeed, higher-order logic) in terms of sets and set-theoretic ideas like inclusion and n-tuples. The purpose of this is twofold: first, to deepen our intuitive understanding of correct reasoning by embedding all of our isolated instincts about valid patterns of deduction within a single rigid structure, as well as to provide an independent check on those instincts; and secondly, to enable the development of the mathematical theory of logic, i.e., metalogic, where beautiful results about reasoning systems in their own right can be formulated and proven.

Once we have a precise understanding of the semantics (and metatheory) of first-order logic, we can of course go back to the development of set theory and double-check that any logical deductions we used in proving theorems there were indeed genuinely valid in the formal sense (and they all turn out to be, of course). This doesn't count as a technical justification of one by means of the other, but only shows that the two together have a comforting sort of coherence. I think of their relation as somehow analogous to the way in which the two concepts of "term" and "formula" in logical grammar cannot be defined in isolation from one another, but must instead be defined by a simultaneous recursion. That's obviously just a vague analogy, but it's one I find helpful and comforting.

I hope this made sense and was somewhat helpful. My apologies for being so wordy. Cheers. - Joseph

Solution 3:

Properly, logic shouldn't concern itself with notions of set. We can use the notion in an almost metaphorical sense, such as saying that our (first-order) language consists of a set of symbols, and from these symbols we define the set of formulae. This is not really a problem, since the basic objects of logic are symbols that can be written down, and our set or symbols can be just a listing of these symbols (or a description of how to write them), and our set of formulae is just a method for analysing a written expression and determining whether or not it is a formula. As Henning Makholm mentions in his answer, when one moves from logic to model theory the distinction can become murkier.

In (axiomatic) set theory, we consider the notion of a "set" as a basic or undefined. This is for mainly pragmatic purposes: you simply cannot define everything in terms of more basic concepts, and it is hard to think of a concept that "set" could be reduced to. This is really no different than Euclid, who defined a point to be "that which has no part," but left as totally undefined what a part is, or how something could lack one. What the axioms of ZF(C) do tell us is the basic properties that we assume are true about sets, or at least those properties of set that seem to be true after some serious thinking about it. (This may seem a little flippant, but consider the paradoxes/antinomies from the late 19th and early 20th centuries. These caused mathematicians -- or set theorists -- to consider very closely what operations or set construction principles will not obviously lead to contradictions, and these discussions resulted in many of the axioms we have today.)

In terms of functions a predicates, you are correct that a set theorist will define a relation to be a set consisting of ordered pairs, where an ordered pair is a set of the form $\{ \{ a \} , \{ a , b \} \}$, but the set theorist is not actually making any metaphysical/ontological claim about the true nature of these concepts. Instead, the set theorist is translating these notions from ordinary mathematics into the language of sets. This is done because, in the formal theory, one cannot speak about objects that are not sets. If this translation could not be done, there would be no possibility of using set theory as a foundation of all of mathematics, and set theory would be a rather bland area of study. Of course, after we have set up this translation, we must prove that these defined concepts have the same properties as our "informal" conception of them. For instance, a set theorist will prove that $\{ \{ a \} , \{ a , b \} \} = \{ \{ c \} , \{ c , d \} \}$ iff $a=c$ and $b=d$. This being accomplished, we may define the ordered pair $(a,b)$ to be the set $\{ \{ a \} , \{ a , b \} \}$, because our intuitive notion of an ordered pair is of something that can distinguish the first coordinate from the second.

These "parameter dependent" properties you speak of are just formulae (possibly) with free variables. The manner in which we handle such expressions syntactically and semantically may lead one to think about them as functions or predicates -- and this is extremely useful in practice -- but their "true nature" is far from this.

Solution 4:

The axioms of ZFC are stated in first order logic. A statement is in first order logic based on its form, not its content. The two undefined terms in set theory are "set" and $\epsilon$. The things we talk about are called "sets", whatever those might be, and the one undefined relation is the binary predicate $\epsilon$, whatever that might mean. The axioms implicitly determine how these notions interact. The axioms are designed to capture our intuitive notions of how sets act, but once formalized, we no longer have any control over the meaning.

This resembles, for instance, Euclid's axioms for geometry in which "line", "between", etc. are never explicity defined.

A formula $\psi(u_1, u_2, q, \ldots)$ with "free" variables may be true or false depending on its inputs. It defines a relation in general and may define a function. A function is required to have a unique "output" for each set of "inputs"; a relation has no such requirement.

Solution 5:

There actually is a deeper aspect to the second problem, assuming you're talking not purely about the theory of ZFC, but also the theory of its models.

It turns out that certain notions in ZFC are "absolute": That is, if you have a countable transitive standard model $ M $ of ZFC, these properties are true when considered in the "outer" set universe V exactly when they are true in M. For "is an element of", this is by definition of "standard model". But there are less obvious properties that can be shown to be absolute as well, such as "is equal to $ \aleph_0 $". Of course, by far not all properties are absolute. "is equal to $ \aleph_1 $", for instance, most definitely is not absolute, which is shown by the mere existance of countable transitive standard models.

In aspects of model theory where one operates with "real" sets to construct models, one certainly must very carefully distinguish between properties of "sets" in the model and sets in the outer meta-theory.

For more on this see, for instance, Kunen's "Introduction to Independence Proofs" (Chapter 4 specifically) - but most introduction books on set theory will discuss this issue in some form.