What's the point of allowing only quantification of variables in first-order logic.

In first-order languages, ${\forall}$ is allowed to quantify only over variables, so that ${\forall}v(P)$, where $v$ is some variable and $P$ is a WFF is the only kind of a WFF concering universal quantifiers allowed in such languages.

Why is that? No texts I read on this subject (which due to the limitations I have due to the place I live have only been sites like Wikipedia, Proofwiki or Wolfram MathWorld) provided an explanation why quantifying over WFFs is not allowed. I don't see any advantages to this and it leads to annoying problems like ZFC being an infinite list of axioms, which don't avoid the concept of ${\forall}P(Q)$, $P$ and $Q$ being WFFs, just phrase it in another way.

So what's the justification of not quantifying over WFFs? What are the advantages of such approach?


It's because when we do allow broader scopes of quantification, things get extremely ugly.

One extremely important fact about first-order logic is that it has a reasonable notion of proof: if I want to know whether $\varphi$ is true whenever each sentence in $\Gamma$ is true - that is, if $\Gamma\models\varphi$ - I just look for a proof of $\varphi$ from $\Gamma$. The crucial facts about proofs are:

  • A proof is finite (in particular, can be represented by a single natural number).

  • I can identify proofs (so the set of numbers corresponding to valid proofs is computable).

Passing to stronger logics tends to kill this property: in particular, second-order logic - which I think is the natural setting for what you're describing - provably has no good proof system! In fact, I can write down a second-order sentence which is valid (= true in every model) iff the Continuum Hypothesis is true! So, determining what constitutes a proof in second-order logic requires us to first make strong set-theoretic commitments - which is putting the cart before the horse a bit!

This is not to say that stronger logics are uninteresting (see Barwise and Feferman's book Model-Theoretic Logics), merely that they lack nice properties which make first-order logic manageable (the other main property being the Lowenheim-Skolem property, which is a bit more technical). In fact, we can characterize first-order logic as the strongest logic satisfying a couple nice properties (this is Lindstrom's Theorem). So there's a trade-off to be made, in choosing what logic to use.


Because when you allow quantification over other elements of the language you're no longer in first-order logic, but rather second- or higher-order logic, where you can quantify over predicate and relational variables, predicate-of-predicate variables, etc.

Note that what you propose is not best thought of as quantifying over WFFs: it's not typical to allow the language to talk about its own WFFs and syntax in that way. In 2nd order logic, a new set of variables is introduced, $P^n_m$ (= the $m$-th variable for an $n$-ary relation), interpreted as subsets and more generally relations of individuals that the variables $v_i$ range over.

Just as you can substitute terms for free 1st order variables, in 2nd order logic you can substitute formulas for free 2nd order variables. The rules for doing so are somewhat complicated; they're spelled out in Introduction to Mathematical Logic by Alonzo Church (first published in 1944, revised in 1956).

There are very good reasons to isolate first order logic as a thing in itself, worthy of attention and study. One is that it's sufficient to formalize traditional logic, a la Aristotle, and even the logic of relations as developed in the 19th century by Schroder and others. Another reason is that first order logic has many "nice" properties (completeness, compactness) which none of the higher-order logics enjoy.

2nd and higher order logics are indeed helpful and very expressive -- they can capture distinctions in natural language which first order logic can't. But higher order logics are very different beasts: they lack the nice mathematical properties possessed by first order logic. For example, there is no complete deductive system. They're really a form of set theory — "set theory in sheep's clothing", as Quine put it. In fact, the "theory of types", Russell & Whitehead's system that provided a foundation for mathematics in a higher order logic, predates Zermelo's set theory by a decade or so. Russell maintained that this established that mathematics is logic, a philosophy of math subsequently known as logicism; but current consensus is that Russell & Whitehead accomplished a reduction of mathematics to set theory, and not to what's commonly accepted as "logic".


There's nothing inherently wrong with quantifying over propositions, as in your $\forall P$ quantifiers.

However, a logic is, by definition, "first-order" if quantification is restricted to variables ranging over the universe of discourse, only. Quantifying over subsets of said universe, over predicates, over propositions, etc. makes the logic "higher-order" instead.

In first order logic, we get some metatheoretic results which fail in higher order logics. For instance, we get soundness and completeness: the provable formulae are exactly those which hold in all possible models. In higher order logics, usually only soundness holds: provable formulae are true in all models, but there are some formulae which do hold in all models yet can not be proved by the logic.


Also note that, if you have a formula $\phi(P)$ which depends on $P$ in an extensional way, i.e. $$ (P_1 \iff P_2) \implies (\phi(P_1) \iff \phi(P_2)) $$ then the quantification $\forall P.\ \phi(P)$ can be expressed in FOL as $$ \phi({\sf true}) \land \phi({\sf false}) $$ The above "encoding" of the universal quantifier relies of the classical tautology $$ (P \iff {\sf true}) \lor (P \iff {\sf false}) $$ Hence, at least for this case, we get an equivalent first-order formalization.