How do lambda calculus most basic definitions work?
Solution 1:
$\lambda f.\lambda x. x$ is a function, not a number. But the point is that we can use it as a model of a number.
In modern mathematics, we understand things not as what they are, but as what they do. We don't spend a lot of time talking about the intrinsic essence of the number 1, or asking "but what is the number 1?" That's a question for philosophers, maybe.
Instead, we ask if something behaves the way we want the number 1 to behave.
What do we need to say that some collection of objects behaves like numbers? Well, at the very least, we would like there to be an infinite collection of the objects, all distinguishable, and we would like to be able to do addition and multiplication and the like. If we take the object that we identify as being the number 1, and add it to itself, we should get the object that we identify as being the number 2. If it doesn't do that, then it isn't very much like numbers.
You may want to consider the way computers handle numbers. The number 1 inside the computer is represented as some pattern of electron flow through some chunk of silicon. What does that have to do with the number 1? Is it really a number 1? Hard to say! But it does act like a number 1 in the following sense: we can ask the computer to add 1+1 and it will do some magic thing, and the result will be the pattern of electrons that has been designated as representing the number 2. But what do the electrons have to do with numbers? Are they really numbers? Is there 1 electron somewhere, and we add to it another 1 electron to get 2 electrons? (No.) Is the magic thing really addition? It's hard to see how, but for most purposes, we don't care.
The Church numerals are similar. Whether or not they are numbers isn't the question. The question is whether they act like numbers, or rather whether they can be made to act like numbers. For example, if you define $$\begin{align} \operatorname{plus} & = \lambda m n f x . (m f (n f x))\\ 1 & = \lambda f x . f x\\ 2 & = \lambda f x . f (f x) \end{align}$$
you get a plus function: $\operatorname{plus} 1\, 1$ is in fact the same function as $2$. And that is because this thing we are calling “$2$” is essentially the operation that applies some function twice to an argument, the “$1$” is the operation that applies a function once to an argument, and $plus$ takes two numerals $m$ and $n$, and then a function and an argument, and applies the function first $n$ times to the argument and then applies the function $m$ times to the result, for a total of $m+n$ applications. So $\operatorname{plus} 1\, 1$ does the same thing as $2$ when applied to any arguments $f$ and $x$ (namely, it computes $f(f(x))$) and is therefore the same function; in the $\lambda$-calculus, the two objects are indistinguishable.
There's nothing special about this definition; there are many different ways one could define numbers in $\lambda$-calculus. For example:$\def\pair{\operatorname{pair}}$
$$ \begin{align} 0 & = \pair true \;true \\ 1 & = \pair false \;0 \\ 2 & = \pair false \;1 \\ 3 & = \pair false \;2 \\ & \cdots \end{align} $$
This represents each number $n$ as a linked list of length $n$ falses followed by a true. We can test if a number is zero by examining the head of the list; it's $true$ if the number is zero and false otherwise. We can easily define a successor (“$+1$”) operation: $$\operatorname{successor} = \lambda n . \pair false\; n$$ and with enough ingenuity we can define addition and multiplication functions. Is this somehow less legitimate than the Church numerals? No; it's just less convenient. Do we prefer the Church numerals because they better capture the true essence of numbers? No, we only prefer the Church numerals because definition of $plus$ is simpler.
And again: What does it mean for something to be true or false? In this context we are not concerned with abstract truth or falsity. We just want something that behaves in a certain way: We want true and false to be different, and distinguishable, and we want to have some sort of if-then-else construction so that $\mathbf{if}\; b\; \mathbf{then}\; p \;\mathbf{else}\; q$ evaluates to $p$ when $b$ is true, and to $q$ when $b$ is false. There is nothing more to it than that. The definitions
$$ \begin{align} \mathbf{true} & = \lambda x y . x \\ \mathbf{false} & = \lambda x y . y\\ \mathbf{if}\; b\; \mathbf{then}\; p \;\mathbf{else}\; q & = \lambda b p q . b p q \end{align} $$
do exactly that, and that's all we require of them. We could just as easily have defined them this way:
$$ \begin{align} \mathbf{true} & = \lambda x y . y \\ \mathbf{false} & = \lambda x y . x\\ \mathbf{if}\; b\; \mathbf{then}\; p \;\mathbf{else}\; q & = \lambda b p q . b q p \end{align} $$
Notice that I switched the definitions of $\mathbf{true}$ and $\mathbf{false}$ here—$\mathbf{true}$ now has the definition that usually belongs to $\mathbf{false}$, and vice versa—but I also changed the definition of $\mathbf{if}$ to match, so that $\mathbf{if}\; b\; \mathbf{then}\; p \;\mathbf{else}\; q$ will still be $p$ when $b$ is $\mathbf{true}$. Is this somehow saying that truth and falsity are interchangeable? No, it says nothing of the sort. It says that in our model they are interchangeable. But nobody ever said that the model was supposed to capture all the properties of truth, like that it is true that snow is white. It's only intended to capture the one property of truth that we intended to model, which is that $\mathbf{if}\; b\; \mathbf{then}\; p \;\mathbf{else}\; q$ is $p$ whenever $b$ is true, and $q$ whenever $b$ is false.
Solution 2:
To address directly your question "how can, formally, a function be turned into a number?", let me first address a different question: "how can, formally, a string of digits be turned into a number?"
As you are surely aware, a string of digits is not a number. A string of digits is just a sequence of characters. And yet we speak without problems about "the number 120".
The point is that we use strings of digits to represent numbers. That is, we have a mapping between strings of digits and numbers, which allows us to identify any number we want with a string of digits. We do so because we cannot write down numbers, but we can write down strings of digits.
Now there are different ways to translate a number into a string of digits. The Romans would have written the number 120 (that is, the number which we denote by the digit string 120
) as CXX
. Now the Roman number system was impractical for multiplication, and only moderately practical for larger numbers, which is why we switched to the more practical decimal system. Here we can give easy rules on how to manipulate strings of digits in order to find sums, differences, products and quotients of numbers.
So how is all this related to Church numbers? Well, here we face basically the same problem: We want to represent numbers, but all we have are functions. So we need to define functions which somehow encode numbers. Now there are infinitely many ways how we could do it, however some are more practical than others. Church integers are a way that's quite practical for addition, multiplication, and doing loops (that is, doing an operation $n$ times). They are not a particularly good representation for subtraction, BTW.
Note that the analogous problem also occurs in set theory, where you also want to represent the natural numbers, but all you have are sets. So here the standard construction is to start with the empty set for $0$, and represent $n+1$ by $n\cup\{n\}$.
So what are natural numbers, after all? Are they digit strings? Are they functions? Are they sets?
They are in some sense all of those, and in another sense neither. What natural numbers are is basically fixed by the Peano axioms. Whenever you have something which fulfils the Peano axioms, you have a model (a representation) of the natural numbers. The digit strings are, basically because the digit string 0
stands for the number $0$, and we have rules that give the digit string for $n+1$ whenever we have a digit string for $n$ (and all digit strings correspond to some $n$). The church integers do because we have a representation for $0$ (namely $\lambda f.\lambda x.x$) and a function which gives us $n+1$ given $n$ (namely $\lambda n.\lambda f.\lambda x.f (n f x)$). And the special sets do, again, because we have $0$ ($\emptyset$) and $n\mapsto n+1$ ($n\cup\{n\}$). But neither are "the" natural numbers, as each of them is an equally valid representation of the natural numbers (just like base 2, base 8 and base 10 representations are equally valid representations). You cannot point to one representation and say those are the true natural numbers, and all others are not.
Solution 3:
You are right about $\overline{1} := \lambda f.\lambda x.(f x)$, the first argument is "supposed" to be a function and the second argument is $f$'s argument.
First of all, note that nothing is typed here (as you mentioned you know programming, take care to remind this distinction). This means that you could supply as first argument any term.
More closely to your question, the representation of integers could be done in many ways, the one you introduced is one out of many. As a side note, the representation of $n$ is not what you gave, it is $\lambda f.\lambda x. (f(f(f \ldots x)))$ with $n$ applications of $f$. This representation is used to prove some computability theorems, namely that if a function $g : \mathbb N \rightarrow \mathbb N$ is primitive recursive, then you can find a lambda term $f$ such that $(f \overline n) =_\beta \overline m$ if, and only if, $g(n) = m$. But you can use all these terms as "standard" lambda-terms and compute with them, forgetting completely that they can be used to represent integers.
The story is the same with the boolean. You could use many representation, but $\lambda xy.x$ is often used to denote truth. This can be used to represent some boolean functions $\{t,f\}^k \rightarrow \{t,f\}$.
Summing up, these terms don't mean anything "intrinsically", they are just lambda-terms. But when the different models of calculus came up in the 20th century, we had to have a way to compare them. Giving a representation of integers within lambda-calculus allowed to compare it to primitive-recursive functions, for example.
Solution 4:
Lambda calculus is a model for computation in which everything is a function. As such, if there are numbers in lambda calculus, they need to be functions, too.
If you define numbers via the above definition, you can start defining addition, multiplication, and exponentiation:
$$\begin{align}A &:= \lambda mnfx.(mf)(nfx)\\ M &:= \lambda mnf.m(nf)\\ E &:= \lambda mn.nm \end{align}$$
Then $(A1)2=3$, for example, and $(E2)3=8$.
It turns out that any computable function on the natural can be written as a lambda expression, although that is far from obvious initially.
[ It always struck me as funny that, in Lambda calculus, exponentiation is by far the easiest function to define, followed by multiplication, then followed by addition. Usually, we define multiplication in terms of addition, and exponentiation in terms of multiplication, so we think of their complexity in that order.]