The 9 Billion Names of God

TLDR; I go on a math adventure and get overwhelmed :)

Some background:

My maths isn't great (I can't read notation) but I'm a competent programmer and reasonable problem solver. I've done the first dozen or so Euler problems and intend to continue with that when I have time.

The problem:

In Arthur C Clarke's story "The 9 Billion Names of God" the names of God are all possible sequences in an unspecified alphabet, having no more than nine characters, where no letter occurs more than three times in succession.

Out of curiosity, I started playing around with determining how many valid sequences there are in a range.

I started with digits repeating in base 10 numbers, at heart it's the same problem as letters repeating in an alphabet.

Not being very knowledgable about math, I thought I'd write a program iterate over ranges and count all the elements that match the above condition, then put the results in a spreadsheet to see if a clear pattern of some kind emerged that would let me write an algorithm to determine the number of valid sequences in a given range.

I started with the constraint that a digit could only appear once, so in the range 0-99 there are 9 invalid sequences, 11, 22, 33 etc., leaving 91 valid 'names of God'.

Here's the table for 0-99 through 0-99999999. I stopped there because beyond that's where it started taking to long to calculate and I didn't want to get sidetracked optimizing.

0-99              91
0-999            820
0-9999          7381
0-99999        66430
0-999999      597871
0-9999999    5380840
0-99999999  48427561

I also generated a table for digits appearing no more than twice or thrice:

0-999            991
0-9999          9820
0-99999        97300
0-999999      964081
0-9999999    9552430
0-99999999  94648600

0-9999          9991
0-99999        99820
0-999999      997300
0-9999999    9964000
0-99999999  99550081

I haven't got around to looking into these yet, because I got fascinated by the first table.

The first table appears in OEIS as A002452.

Going from there, looking at all sorts of different things, amongst them the sequences of numbers in different placeholder columns in the tables, differences between numbers in different columns and/or tables etc. I looked at all sorts of things, I wish I'd documented it more, I was just idly mucking around with a spreadsheet and Googling sequences. With a quick Google search I found some of these sequences in all sorts of strange places, some examples include transformations of Lucas Numbers, solutions to Kakuro / Addoku / Soduku puzzles, repunits, the coordinates of geodesic faces, even the Ishango bone, which I'd never heard of before. It justs goes on and on.

Maths is full of this sort of thing isn't it? And I'm just looking at one little problem from one very specific angle, this is just the tip of the iceberg here isn't it?

Questions/requests for comments:

  1. I'm presuming that my adventure isn't anything extraordinary at all and maths is full of this unexpected relationships stuff, true?

  2. What is the right way to describe the problem outlined in the first few paragraphs, what do I need to learn to figure it out?

  3. I'd love to hear any comments/trivia etc. relating to this little adventure please!


The names you describe can be described by a regular expression, hence the set of all names is a regular language. Equivalently, names can be recognized by a deterministic finite state machine (I can think of one with $28$ states, but this is probably not optimal). If $G_n$ denotes the number of names of length $n$, it follows that the generating function $\sum G_n x^n$ is rational and can be calculated fairly explicitly (in several different ways), which leads to a closed form for $G_n$ as a sum of exponentials $\alpha^n$ for various $\alpha$ (possibly with polynomial coefficients) via partial fraction decomposition.

In other words, such sequences are well-understood from several related perspectives. Unfortunately I don't know a particularly elementary introduction to this material. The simplest nontrivial example of a sequence of this kind is a sequence counted by the Fibonacci numbers: the words are words over an alphabet of two letters $A, B$ with the restriction that the letter $B$ can never appear twice in a row. Here the generating function is $\sum F_n x^n = \frac{x}{1 - x - x^2}$ and this gives the closed form

$$F_n = \frac{\phi^n - \varphi^n}{\phi - \varphi}$$

where $\phi, \varphi$ are the positive and negative roots of $x^2 = x + 1$. A similar, but more complicated, closed form exists for the sequence you're interested in.

The closest thing I know to a complete reference is Chapter 4 of Stanley's Enumerative Combinatorics, but this is not easy reading. Sipser's Introduction to the Theory of Computation discusses regular languages and finite state machines, but does not address the enumerative aspects of the theory. There is also a discussion of these issues (and much, much more) in Flajolet and Sedgewick's Analytic Combinatorics (also not easy reading).


Since regular languages are in some sense the simplest languages, sequences counting words in regular languages appear frequently in many situations. For example, pick any word $w$. The set of all words in which $w$ doesn't appear as a subword is a regular language, and so using the machinery I describe above one can compute the probability that a random sequence of letters of a certain length does or doesn't contain $w$. This has applications, for example, to the study of DNA sequences, if you want to ascertain how likely it is that a certain sequence of nucleotides $w$ could occur in a strand of DNA however many nucleotides long by chance. More prosaically, you can compute, for example, the probability of flipping $7$ tails at some point out of $150$ coin flips.


These are pretty vague questions, but here goes:

  1. True. There's lots of unexpected connections and relationships, and part of the fun is unraveling the mystery of their occurrence (see e.g. "bijective proofs").

  2. Combinatorics. More specifically, generating functions. Try generatingfunctionology. You may find it not too easy.

  3. The number of sequences of length $n$ comprising of bits (0 and 1) and no two consecutive 1's is $F_n$, the $n$th Fibonacci number.


I was working on this problem several years ago and discovered a polynomial that calculates the number of 1,2,3,...9 letters words with an n-letter alphabet.

number of words = n^9 +n^8 +n^7 -5n^6 +n^5 +n^4 +4n^3 -2n^2 +n

n =9, words =432,661,347 This is the same number Ross Millikan found using a spreadsheet

n =12, words =5,610,940,140 Ross Millikan listed it as 5.6 billion

n=13, words =11,459,252,883 Ross Millikan listed it as 11.5 billion

Everyone here seems to think that (in the story) the two hired engineers printed out a 9 billion word list. People here are trying to calculate which n-letter alphabet system gives 9 billion words. The list that the engineers printed was much longer than 9 billion. In the story, "[The monks] have been compiling a list which shall contain all the possible names of God." The monks believe that God has around 9 billion names. “Well, they believe that when they have listed all His names — and they reckon that there are about nine billion of them — God’s purpose will be achieved." The engineers are printing out every possible name of God. Not every name they print is one of God's names.

Suppose you wanted to print out all POSSIBLE 2-letters words in the English language. There are 101 actual 2-letters words. http://www.yak.net/kablooey/scrabble/2letterwords.html If you started printing out the list, starting with AA, AB, AC through ZX, ZY, and ZZ. You would have printed out 26^2 =676 possible words(not 101)


A solution to the specific problem can be done as follows: define $A(n)$ as the number of strings of length $n$ with the last two characters different, $B(n)$ as the number of strings of length $n$ with the last two characters the same but the third to last different, and $C(n)$ as the number of strings of length $n$ with the last three characters the same but the fourth to last different (or missing). We can set up a set of recurrence relations based on adding another character to the string. $A(n+1)=8(A(n)+B(n)+C(n)$ because we just have to add some character different from the last. $B(n+1)=A(n)$ because we have to add the same character to the end of the $A$ string. Similarly $C(n+1)=B(n)$. A sanity check is we can add any character to an $A$ or $B$ string and still be "in play", but we cannot add a matching character to a $C$ string. We can start with $A(1)=9, B(1)=0, C(1)=0$This is in a form that a spreadsheet can handle. I actually get $432661347$, about $\frac{1}{20}^{\text{th}}$ of $9$ billion (or $\frac{1}{20000}^{\text{th}}$ considering that Clarke was British). If their theology is no better than their arithmetic, we should be safe.