We all use mathematical induction to prove results, but is there a proof of mathematical induction itself?
Solution 1:
Suppose we want to show that all natural numbers have some property $P.$ One route forward, as you note, is to appeal to the principle of arithmetical induction.
The principle is this: Suppose we can show that (i) $0$ has some property $P$, and also that (ii) if any given number has the property $P$ then so does the next; then we can infer that (iii) all numbers have property $P$.
In symbols, we can use $\varphi$ for an expression attributing some property to numbers, and we can put the induction principle like this:
Given (i) $\varphi(0)$ and (ii) $\forall n(\varphi(n) \to \varphi(n + 1))$, we can infer (iii) $\forall n\varphi(n),$
where the quantifiers run over natural numbers.
The question being asked is, in effect, how do we show that arguments which appeal to this principle are good arguments?
Just blessing the principle with the title "Axiom" doesn't yet tell us why it might be a good axiom to use in reasoning about the numbers. And producing a proof from an equivalent principle like the Least Number Principle may well not help either, as the question will just become why arguments which appeal to this equivalent principle are good arguments?
Is there nothing to be said then?
Well we can perhaps still hope for some conceptual clarification here. To that end, consider this informal argument.
Suppose we establish both the base case (i) and the induction step (ii).
By (i) we have $\varphi(0)$.
By (ii), $\varphi(0) \to \varphi(1)$. Hence we can infer $\varphi(1)$.
By (ii) again, $\varphi(1) \to \varphi(2)$. Hence we can now infer $\varphi(2)$.
Likewise, we can use another instance of (ii) to infer $\varphi(3)$.
And so on and so forth, running as far as we like through the successors of $0$ (i.e. through the numbers that can be reached by starting from zero and repeatedly adding one) until we get to the $n$'th successor of zero.
But the successors of $0$ are the only natural numbers, so for every natural number $n$, $\varphi(n)$.
Set out like that, we see that the crucial point here is the last one. And what this reasoning reveals is how arithmetical induction principle is tantamount to a claim about the basic structure of the number sequence. The natural numbers are characterized as those which you can get to step-by-step from zero by repeatedly adding one, and that implies the absence of `stray' numbers that you can't get to step-by-step from zero by applying and reapplying the successor function. If a property gets passed down from zero, and from successor to successor, it gets passed on to every number because there are no 'stray' numbers outside the sequence of successors.
Now, how illuminating is that? We are in effect defining the natural numbers as those objects for which that informal line of argument is a good one, i.e. the numbers are those objects over which we can do arithmetical induction? Is this going round in circles?
Well, in a sense, yes! But note this isn't a vicious circle -- rather it is e.g. just how Frege and Russell famously defined the natural numbers. The point that is emerging from these reflections is that the principle of induction just cashes out the intuitive idea that the natural number are as those which you can get to step-by-step from zero by repeatedly adding one. To fix what we are talking about in talking of the natural numbers is to fix that we are talking about things for which the principle of arithmetical induction holds. That is why we can take the principle as axiomatic.
Note: Of course, you can do fancy things like model natural numbers in ZFC, and then prove in ZFC that a formal version of induction-over-these-number-representing-sets. But it is important to see that this does not help with the question that is being asked here. After all, what is the criterion for judging that a certain sequence of sets defined in ZFC is appropriate to be used as a model of the numbers? Not any old sequence will do. Some are far too long! An appropriate sequence is one where you can get step-by-step from the 'zero' by repeatedly 'adding one', and there are no 'stray' numbers in the sequence that you can't get to step-by-step from 'zero' by applying and reapplying the 'successor' function in the model. In other words, we choose the implementation of natural numbers in ZFC in order to ensure that the sequence in question has the right structure to make arithmetic induction work. Now, you can prove things in ZFC about that sequence of sets, including an analogue of arithmetical induction. But to take that as giving you some independent reason for thinking that induction holds over the numbers is to get things exactly upside down. Proving induction in ZFC is, to put it vividly, only taking out the inductive rabbit that you've already smuggled into the hat when you chose how to implement the numbers!
Solution 2:
The induction scheme can be proved and generalized to be used on sets bigger than the natural numbers (The proof can be done under ZFC, which is an acceptable axiomatic system which is strong enough to describe most mathematics we know).
Lets prove for example, the induction principle for well ordered sets. Suppose you have a well ordered set (every subset has a minimal element relative to some ordering <) $\left(A,< \right)$, such as the natural numbers $\mathbb{N}$ with their natural ordering. Let $\varphi(x)$ be some property and assume that $\left( \forall y<x \hspace{2mm} \varphi(y) \right) \rightarrow \varphi(x)$. We now want to show that $\varphi(x)$ holds for all $x\in A$ (This form of induction may be known to you as complete/strong induction).
Suppose it doesn't hold for all $x\in A$, then let $B=\left\{ x\in A | \neg \phi(x) \right\}$, be the set of all elements in $A$ for which the property does not hold. By our assumption, $B$ is not empty, hence it has a minimal element $b$ ($A$ is well ordered by $<$). By minimality of $b$, we have $\forall b'\in B: b'<b\rightarrow\varphi(b')$. By definition of $B$, the property holds for all elements outside of $B$, thus we can ommit $b'\in B$ and get $\forall b'<b: \varphi(b')$ . However we assumed that $\left( \forall y<x \hspace{2mm} \varphi(y) \right) \rightarrow \varphi(x)$, thus we have $\varphi(b)$, contradicting $b\in B$.
So in fact, we do prove the induction principle, and dont only accept it since it seems natural (or label it as an axiom, at least not in ZFC). During the proof we cant use the induction principle which were trying to prove (otherwise it would have made no sense), but we do use our knowledge of the structure of the set were performing the induction on (in this case, the well ordering).
Edit: I guess this is one of the fancy things Peter Smith mentioned. I think it's rather elegant and shows that we don't have to take the induction principle as black magic (or that we don't have to consider it an underlying principle of the universe).