Why write ten as $10$ and not as $00$

Decimal notation has ten symbols,

$$ \{0,1,2,3,4,5,6,7,8,9\} $$

If we list all the combinations in numerical order, we have

$$ \{0,1,2,3,4,5,6,7,8,9,00,01,02,03,04,05,06,07,09,10,11,...\} $$

Notice the presence of numbers prefixed with $0$. However, when describing natural numbers, we neglect or ignore the combinations with leading $0$ and pose that $1=01=001=0001=...01$. Why do we do this? Is it just a convention, or is there a deep subtlety.

We do the same for all number representation. For example, the numbers in binary are $\{0,1,10,11,100,...\}$ and not $\{0,1,00,01,10,11,...\}$.

It seems like we are wasting one of the digit. So why are we not doing it?


The point of decimal notation is not just to assign each number a "code" - it has an attached meaning.

When we write a decimal expression $$a_na_{n-1}...a_2a_1a_0,$$ what we mean is $$a_n10^n+a_{n-1}10^{n-1}+...+a_210^2+a_110^1+a_010^0.$$ Given that, it's clear that $$00=0\cdot 10^1+0\cdot 10^0=0+0=0.$$

Again, the point isn't to label numbers efficiently, but to label them in a particular meaningful way.


That is an entirely possible, and valid, way to write numbers. It, or something very close, is called "bijective base-10 notation" since it establishes a bijection between the strings of the 10 digits and the numbers they are to represent. In particular, leading zeroes become important and actually affect the value of the number, while in regular base 10 (or more generally, base-$B$), they simply serve to add redundancy to the system. The Wikipedia mentions it, although the system shown there isn't quite what you have given here but that's really a question of which symbols you map to numbers:

https://en.wikipedia.org/wiki/Bijective_numeration

In this system, the natural number sequence is put in bijection with the digit strings ordered in shortlex order:

$$0 \leftrightarrow 0$$ $$1 \leftrightarrow 1$$ $$...$$ $$9 \leftrightarrow 9$$ $$10 \leftrightarrow 00$$ $$11 \leftrightarrow 01$$ $$...$$ $$19 \leftrightarrow 09$$ $$20 \leftrightarrow 10$$ $$21 \leftrightarrow 11$$ $$22 \leftrightarrow 12$$ $$...$$

where the left is the natural sequence in traditional notation and the right is the bijective notation.

Then if a number is represented in this notation as $d_{n-1}d_{n-2}\cdots d_1d_0$ the value represented (taking each "digit" to be a number from 0 - 9 in traditional notation), in traditional notation, is

$$N = d_0 + \sum_{i=1}^{n-1} (d_i + 1) 10^i$$

This formula actually shows that making zero have the symbol "0" in this system is perhaps not the best choice as the sum formula is inelegant with a term hanging out in front, and what we'd really want to do is represent it as an empty string, i.e. nothing, or, to denote this special case, a symbol such as $\epsilon$. That is, have

$$0 \leftrightarrow \epsilon$$ $$1 \leftrightarrow 0$$ $$2 \leftrightarrow 1$$ $$...$$ $$9 \leftrightarrow 8$$ $$10 \leftrightarrow 9$$ $$11 \leftrightarrow 00$$ $$12 \leftrightarrow 01$$ $$13 \leftrightarrow 02$$ $$...$$ $$20 \leftrightarrow 09$$ $$21 \leftrightarrow 10$$ $$22 \leftrightarrow 11$$ $$23 \leftrightarrow 12$$ $$...$$

Then

$$N = \sum_{i=0}^{n-1} (d_i + 1) 10^i$$

which is much nicer. Note the empty string, despite the symbol $\epsilon$, has nothing in it: $n = 0$ and the sum is an empty sum which is vacuously 0. Actually, to avoid notational confusion we would probably really want to write this sum as

$$N = \sum_{i=\mathrm{Zero}}^{n-\mathrm{One}} v(d_i) \mathrm{Ten}^i$$

where $v$ means "valuation" of the digit $d_i$, meaning the actual natural number corresponding to that digit, thus avoiding "type abuse" of treating a digit symbol as a number to which we can do addition but rather as a non-numerical glyph, and we have used English words to avoid ambiguity. Then the symbol "0" is valuated as the number one (that is, one is $v(0)$), "1" is the number two, etc up to "9" is valuated the number ten (using English words to avoid the symbols again).

So the question of why ... well I suppose it is because the "usual" way is how most natural languages, including English, that have counting at all, handle numbers. We say 20 as "twenty", a squeezed form of "two ten" essentially, thus immediately suggestive of the written form which, by the way, was only introduced later than the development of natural language numerals. But the other system does still work, and contrary to what other posters here are suggesting it is not necessarily less "meaningful" than before, it is still a polynomial in the base but you are simply changing the range in which the digits are valuated from 0 to $B-1$ to 1 to $B$ (back to ordinary notation again), where $B$ is the base. It is furthermore the only way you can make "base one", or unary, actually work. In the "usual" schema, base one would have only the digit "0", which is valuated to zero, and then any string of this equals only zero, no matter how long, meaning base one fails to represent the natural numbers. But in this system, base one works perfectly, and a string of "0" represents the number equal to the amount of repetitions of the symbol "0" in that string. Base zero is, instead, the one which collapses and that is in a way more sensible because it has no symbols it can use to represent anything, so it should not be expected to be useful. Furthermore, this notation system has the somewhat more elegant property that the number of digits required to represent a number $n$ is $\lceil \log_B(n) \rceil$ instead of the clunkier $\lfloor \log_B(n) + 1 \rfloor$ of the usual system. That is, the relationship to the logarithm is more transparent.

Now you might be wondering, "what about fractions? Does this solve the $0.9999... = 1$ problem?" And the answer there is no. The reason for that is that the real numbers, and rational numbers, and the lexicographically ordered infinite decimals (or base-$B$ strings) are not order-isomorphic to the real or rational numbers. Rather they are something different, and topologically isomorphic (homeomorphic) to a Cantor set (or 1-D Cantor dust).


Your suggested notation leads to numbers that are only slightly shorter than in the usual positional system: one digit shorter at best, and over 90% of all numbers will be the same length. This can be seen as an advantage, but is extremely minor.

On the other hand, the positional system has lots of advantages: for example, you can sum numbers simply by summing the digits column by column, then handling carries. Subtraction, multiplication and division have similarly convenient pen-and-paper algorithms. Try computing 00937 / 042 using your system and you’ll see the problem.


We have to get back to the invention of zero for this. Stepping away from Roman numerals, one might write (and historically did, though not necessarily using ten as a base and with varying symbols in place of H, T, U) the numbers represented by CCLXXXV, CDII, CDXX, and XLII as "3H, 8T, and 5U", "4H and 2U", "4H and 2T", "4H and 2U". This kind of notation requires symbols such as H, T, U to assign the correct "weight" to the digits.

Writing numbers becomes more economic if you can simply leave out this H, T, U stuff. Then "385" is fine; you might be ok with "4 2", but this notation for CDII easily mistaken for XLII (or MMMMII, or ...).

Enter the invention of "0" to be used in gaps as needed (writing and notation are always also a matter of being economic!). There is no need to specify that there are no hundreds (no thousands, no ten-thousands, ...) in LX, there is only a need to specify that there are no units, so we write "60".

Finally, we add a notation using the digit "0" for the number zero. This might be called an abuse of notation for this corner case. Incidentally, this needs you to consider zero as a number in the first place. As if a farmer asked how many animals he as would answer "4 cows, 3 pigs, and 10 sheep ... and also 0 chickens, laying an average of 0 eggs per day"!


One subtle point to consider is that even though there are ten symbols in decimal and thus 10^n combinations for n digits, if you consider 0 to be distinct from 00 then actually slightly more information is being encoded for a given digit length, because one must consider the absence of a symbol to be a symbol itself. This perspective is probably more striking to a computer programmer than a mathematician to whom digit length constraints are perhaps not a 'pure' concept.