Why do we believe the Church-Turing Thesis?

The Church-Turing thesis is credible because every singe model of computation that anyone has come up with so far has been proven to be equivalent to turing machines (well, or strictly weaker, but those aren't interesting here). Those models include

Recursive functions over the integers
The lambda calculus
Combinatory logic

and many more. It's indeed hard to imagine anything algorithm-like which cannot be formalized in any of these models.

The difference between the Church-Turing thesis and real theorems is that it seems impossible to formalize the Church-Turing thesis. Any such formalization would need to formalize what an arbitrary computable function is, which requires a model of computation to begin with. You can think of the Church-Turing thesis as a kind of meta-theorem which states

Whenever you pick some intuitive notation of computable and formalize it, the resulting model of computation is either equivalent to turing machines, or strictly weaker.

All concrete instances of this meta-theorem that haven been encountered haven been proven. We do know that all the models of computation listed above are equivalent. Since we cannot prove the meta-theorem, for the reason stated above, we thus cannot do any better than say "Well, we don't see how it could not be true, so we'll assume it's true".

A side note: I think it is an error to identify the Church-Turing Thesis with a claim about what machines can do in practice. Some reflections [if I may be forgiven self-quotation.]

(a) It is important to note that there are three levels of concept that are in play here when we talk about computation.

At the pre-theoretic level – and guided by some paradigms of common-or- garden real-world computation – there is a loose cluster of inchoate ideas about what we can compute with paper and pencil, about what a com- puter (in the modern sense) can compute, and about what a mechanism more generally might compute.
Then at what we might call the proto-theoretic level we have, inter alia, one now familiar way of picking out strands from the pre-theoretic cluster while idealizing away from practical limitations of time or amount of paper, giving us the notion of an effectively computable function. So here some theoretic tidying has already taken place, though the concept still remains somewhat vaguely characterized (what makes a step in a step-by-small-step algorithmic procedure ‘small’ enough to be admissible?).
Then at the fully theoretic level we have tightly characterized concepts like the concept of a recursive function and the concept of a Turing-computable total function.

It would be quite implausible to suppose that the inchoate pre-theoretic cluster of ideas at the first level pins down anything very definite. No, the Church–Turing Thesis sensibly understood, in keeping with the intentions of the early founding fathers, is a view about the relations between concepts at the second and third level. The Thesis kicks in after some proto-theoretic work has been done. The claim is that the functions that fall under the proto-theoretic idea of an effectively computable function are just those that fall under the concept of a recursive function and under the concept of a Turing-computable function. NB: the Thesis is a claim about the extension of the concept of an effectively computable function.

(b) There are other strands in the pre-theoretic hodgepodge of ideas about computation than those picked up in the idea of effective computability: in particular, there’s the idea of what a machine can compute and we can do some proto-theoretic tidying of that strand too. But the Church–Turing Thesis is not about this idea. It must not be muddled with the entirely different claim that a physical machine can only compute recursive functions – i.e. the claim that any possible computing mechanism (broadly construed) can compute no more than a Turing machine. For perhaps there could be a physical set-up which somehow or other is not restricted to delivering a result after a finite number of discrete, deterministic steps, and so is enabled to do more than any Turing machine. Or at least, if such a ‘hypercomputer’ is impossible, that certainly can’t be established merely by arguing for the Church–Turing Thesis.

Let’s pause over this important point, and explore it just a little further. It is familiar that the Entscheidungsproblem can’t be solved by a Turing machine. In other words, there is no Turing machine which can be fed (the code for) an arbitrary first-order wff, and which will then decide, in a finite number of steps, whether it is a valid wff or not. Here, however, is a simple specification for a non-Turing hypercomputer that could be used to decide validity.

Imagine a machine that takes as input the (Gödel number for the) wff $\varphi$ which is to be tested for validity. It then starts effectively enumerating (numbers for) the theorems of a suitable axiomatized formal theory of first-order logic. We’ll suppose our computer flashes a light if and when it enumerates a theorem that matches $\varphi$ . Now, our imagined computer speeds up as it works. It performs one operation in the first second, a second operation in the next half second, a third in the next quarter second, a fourth in the next eighth of a second, and so on. Hence after two seconds it has done an infinite number of tasks, thereby enumerating and checking every theorem to see if it matches $\varphi$ ! So if the computer’s light flashes within two seconds, $\varphi$ is valid; if not, not. In sum, we can use our wildly accelerating machine to decide validity, because it can go through an infinite number of steps in a finite time.

Now, you might very reasonably think that such accelerating machines are a mere philosophers’ fantasy, physically impossible and not to be taken seriously. But actually it isn’t quite as simple as that. For example, we can describe space-time structures consistent with General Relativity which apparently have the following feature. We could send an ‘ordinary’ computer on a trajectory towards a spacetime singularity. According to its own time, it’s a non-accelerating com- puter, plodding evenly along, computing forever and never actually reaching the singularity. But according to us – such are the joys of relativity! – it takes a finite time before it vanishes into the singularity, accelerating as it goes. Suppose we set up our computer to flash us a signal if, as it enumerates the first-order logical theorems, it ever reaches $\varphi$. We’ll then get the signal within a bounded time just in case $\varphi$ is a theorem. So our computer falling towards the singularity can be used to decide validity. Now, there are quite fascinating complications about whether this fanciful story actually works within General Relativity. But no matter. The important point is that the issue of whether there could be this sort of Turing-beating physical set-up – where (from our point of view) an infinite number of steps are executed – has nothing to do with the Church–Turing Thesis properly under- stood. For that is a claim about effective computability, about what can be done in a finite number of steps following an algorithm.

The Church-Turing thesis asserts that the informal notion of a function that can be calculated by an (effective) algorithm is precisely the same as the formal notion of a recursive function. Since the prior notion is informal, one cannot give a formal proof of this equivalence. But one can present informal arguments supporting the thesis. For example, every known attempt at formally modeling this informal notion of computability has led to precisely the same class of recursive functions, whether it be via lambda-calculus, Post systems, Markov algorithms, combinatory logic, etc. This remarkable confluence lends strong support for the importance of this class of functions.

It is worth emphasizing that the Turing Machine was devised by Turing not as a model of any type of physically realizable computer but, rather, as an ideal limit to what is computable by a human calculating in a step-by-step mechanical manner (i.e. without any use of intuition). This point is widely misunderstood -- see Sieg [1] for an excellent exposition on this and related topics.

The finiteness limitations postulated by Turing for his Turing Machines are based on postulated limitations of the human sensory apparatus. A Turing style analysis of physically realizable computing devices and analogous Church-Turing theses did not come until much later (1980) due to Robin Gandy -- with limitations based on the laws of physics. As Odifreddi says on p. 51 of [2] (bible of Classical Recursion Theory)

Turing machines are theoretical devices, but have been designed with an eye to physical limitations. In particular, we have incorporated in our model restrictions coming from: (a) ATOMISM, by ensuring that the amount of information that can be coded in any configuration of the machine (as a finite system) is bounded; and (b) RELATIVITY, by excluding actions at a distance, and making causal effect propagate through local interactions. Gandy [1980] has shown that the notion of Turing machine is sufficiently general to subsume, in a precise sense, any computing device satisfying similar limitations.

and on p. 107: (A general theory of discrete, deterministic devices)

The analysis (Church [1957], Kolmogorov and Uspenskii [1958], Gandy [1980]) starts from the assumptions of atomism and relativity. The former reduces the structure of matter to a finite set of basic particles of bounded dimensions, and thus justifies the theoretical possibility of dismantling a machine down to a set of basic constituents. The latter imposes an upper bound (the speed of light) on the propagation speed of causal changes, and thus justifies the theoretical possibility of reducing the causal effect produced in an instant $t$ on a bounded region of space $V$, to actions produced by the regions whose points are within distance $ct$ from some point $V$. Of course, the assumptions do not take into account systems which are continuous, or which allow unbounded action-at-a-distance (like Newtonian gravitational systems).

Gandy's analysis shows that the THE BEHAVIOR IS RECURSIVE, FOR ANY DEVICE WITH A FIXED BOUND ON THE COMPLEXITY OF ITS POSSIBLE CONFIGURATIONS (in the sense that both the levels of conceptual build-up from constituents, and the number of constituents in any structured part of any configuration, are bounded), AND FIXED FINITE, DETERMINISTIC SETS OF INSTRUCTIONS FOR LOCAL AND GLOBAL ACTION (the former telling how to determine the effect of an action on structured parts, the latter how to assemble the local effects). Moreover, the analysis is optimal in the sense that, when made precise, any relaxing of conditions becomes compatible with any behavior, and it thus provides a sufficient and necessary description of recursive behavior.

Be forewarned that Gandy's 1980 paper [3] is regarded as difficult even by some cognoscenti. You may find it helpful to first peruse the papers in [4] by J. Shepherdson, and A. Makowsky.

[1] Sieg, Wilfried. Mechanical procedures and mathematical experience.
pp. 71--117 in Mathematics and mind. Papers from the Conference on the Philosophy of Mathematics held at Amherst College, Amherst, Massachusetts, April 5-7, 1991. Edited by Alexander George. Logic Comput. Philos., Oxford Univ. Press, New York, 1994. ISBN: 0-19-507929-9 MR 96m:00006 (Reviewer: Stewart Shapiro) 00A30 (01A60 03A05 03D20)

[2] Odifreddi, Piergiorgio. Classical recursion theory.
The theory of functions and sets of natural numbers. With a foreword by G. E. Sacks. Studies in Logic and the Foundations of Mathematics, 125. North-Holland Publishing Co., Amsterdam-New York, 1989. xviii+668 pp. ISBN: 0-444-87295-7 MR 90d:03072 (Reviewer: Rodney G. Downey) 03Dxx (03-02 03E15 03E45 03F30 68Q05)

[3] Gandy, Robin. Church's thesis and principles for mechanisms.
The Kleene Symposium. Proceedings of the Symposium held at the University of Wisconsin, Madison, Wis., June 18--24, 1978. Edited by Jon Barwise, H. Jerome Keisler and Kenneth Kunen. Studies in Logic and the Foundations of Mathematics, 101. North-Holland Publishing Co., Amsterdam-New York, 1980. xx+425 pp. ISBN: 0-444-85345-6 MR 82h:03036 (Reviewer: Douglas Cenzer) 03D10 (03A05)

[4] The universal Turing machine: a half-century survey. Second edition.
Edited by Rolf Herken. Computerkultur [Computer Culture], II. Springer-Verlag, Vienna, 1995. xvi+611 pp. ISBN: 3-211-82637-8 MR 96j:03005 03-06 (01A60 03D10 03D15 68-06)

We do have something other than examples to justify the Church-Turing thesis. Turing's original paper contains a detailed argument that any function which can be algorithmically computed by a human can be computed by a Turing machine. The question whether this argument is a "formal proof" misses the point that Turing's argument is exactly the type of argument that is needed to justify the claim. Indeed, a key motivation for continuing to define Turing machines in computability (rather than using only more convenient models of computation) is that Turing machines are the model that is most amenable to a justification of the Church-Turing thesis, because they are directly inspired by the method of an algorithmic human computer.

Why do we believe the Church-Turing Thesis?

Related

Recent Posts