Why does Python 3 allow "00" as a literal for 0 but not allow "01" as a literal for 1?
Why does Python 3 allow "00" as a literal for 0 but not allow "01" as a literal for 1? Is there a good reason? This inconsistency baffles me. (And we're talking about Python 3, which purposely broke backward compatibility in order to achieve goals like consistency.)
For example:
>>> from datetime import time
>>> time(16, 00)
datetime.time(16, 0)
>>> time(16, 01)
File "<stdin>", line 1
time(16, 01)
^
SyntaxError: invalid token
>>>
Per https://docs.python.org/3/reference/lexical_analysis.html#integer-literals:
Integer literals are described by the following lexical definitions:
integer ::= decimalinteger | octinteger | hexinteger | bininteger decimalinteger ::= nonzerodigit digit* | "0"+ nonzerodigit ::= "1"..."9" digit ::= "0"..."9" octinteger ::= "0" ("o" | "O") octdigit+ hexinteger ::= "0" ("x" | "X") hexdigit+ bininteger ::= "0" ("b" | "B") bindigit+ octdigit ::= "0"..."7" hexdigit ::= digit | "a"..."f" | "A"..."F" bindigit ::= "0" | "1"
There is no limit for the length of integer literals apart from what can be stored in available memory.
Note that leading zeros in a non-zero decimal number are not allowed. This is for disambiguation with C-style octal literals, which Python used before version 3.0.
As noted here, leading zeros in a non-zero decimal number are not allowed. "0"+
is legal as a very special case, which wasn't present in Python 2:
integer ::= decimalinteger | octinteger | hexinteger | bininteger
decimalinteger ::= nonzerodigit digit* | "0"
octinteger ::= "0" ("o" | "O") octdigit+ | "0" octdigit+
SVN commit r55866 implemented PEP 3127 in the tokenizer, which forbids the old 0<octal>
numbers. However, curiously, it also adds this note:
/* in any case, allow '0' as a literal */
with a special nonzero
flag that only throws a SyntaxError
if the following sequence of digits contains a nonzero digit.
This is odd because PEP 3127 does not allow this case:
This PEP proposes that the ability to specify an octal number by using a leading zero will be removed from the language in Python 3.0 (and the Python 3.0 preview mode of 2.6), and that a SyntaxError will be raised whenever a leading "0" is immediately followed by another digit.
(emphasis mine)
So, the fact that multiple zeros are allowed is technically violating the PEP, and was basically implemented as a special case by Georg Brandl. He made the corresponding documentation change to note that "0"+
was a valid case for decimalinteger
(previously that had been covered under octinteger
).
We'll probably never know exactly why Georg chose to make "0"+
valid - it may forever remain an odd corner case in Python.
UPDATE [28 Jul 2015]: This question led to a lively discussion thread on python-ideas in which Georg chimed in:
Steven D'Aprano wrote:
Why was it defined that way? [...] Why would we write 0000 to get zero?
I could tell you, but then I'd have to kill you.
Georg
Later on, the thread spawned this bug report aiming to get rid of this special case. Here, Georg says:
I don't recall the reason for this deliberate change (as seen from the docs change).
I'm unable to come up with a good reason for this change now [...]
and thus we have it: the precise reason behind this inconsistency is lost to time.
Finally, note that the bug report was rejected: leading zeros will continue to be accepted only on zero integers for the rest of Python 3.x.
It's a special case ("0"+
)
2.4.4. Integer literals
Integer literals are described by the following lexical definitions: integer ::= decimalinteger | octinteger | hexinteger | bininteger decimalinteger ::= nonzerodigit digit* | "0"+ nonzerodigit ::= "1"..."9" digit ::= "0"..."9" octinteger ::= "0" ("o" | "O") octdigit+ hexinteger ::= "0" ("x" | "X") hexdigit+ bininteger ::= "0" ("b" | "B") bindigit+ octdigit ::= "0"..."7" hexdigit ::= digit | "a"..."f" | "A"..."F" bindigit ::= "0" | "1"
If you look at the grammar, it's easy to see that 0
need a special case. I'm not sure why the '+
' is considered necessary there though. Time to dig through the dev mailing list...
Interesting to note that in Python2, more than one 0
was parsed as an octinteger
(the end result is still 0
though)
decimalinteger ::= nonzerodigit digit* | "0" octinteger ::= "0" ("o" | "O") octdigit+ | "0" octdigit+
Python2 used the leading zero to specify octal numbers:
>>> 010
8
To avoid this (misleading?) behaviour, Python3 requires explicit prefixes 0b
, 0o
, 0x
:
>>> 0o10
8