Why do 3 backslashes equal 4 in a Python string?
Solution 1:
Basically, because python is slightly lenient in backslash processing. Quoting from https://docs.python.org/2.0/ref/strings.html :
Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the string.
(Emphasis in the original)
Therefore, in python, it isn't that three backslashes are equal to four, it's that when you follow backslash with a character like ?
, the two together come through as two characters, because \?
is not a recognized escape sequence.
Solution 2:
This is because backslash acts as an escape character for the character(s) immediately following it, if the combination represents a valid escape sequence. The dozen or so escape sequences are listed here. They include the obvious ones such as newline \n
, horizontal tab \t
, carriage return \r
and more obscure ones such as named unicode characters using \N{...}
, e.g. \N{WAVY DASH}
which represents unicode character \u3030
. The key point though is that if the escape sequence is not known, the character sequence is left in the string as is.
Part of the problem might also be that the Python interpreter output is misleading you. This is because the backslashes are escaped when displayed. However, if you print those strings, you will see the extra backslashes disappear.
>>> '?\\\?'
'?\\\\?'
>>> print('?\\\?')
?\\?
>>> '?\\\?' == '?\\?' # I don't know why you think this is True???
False
>>> '?\\\?' == r'?\\?' # but if you use a raw string for '?\\?'
True
>>> '?\\\\?' == '?\\\?' # this is the same string... see below
True
For your specific examples, in the first case '?\\\?'
, the first \
escapes the second backslash leaving a single backslash, but the third backslash remains as a backslash because \?
is not a valid escape sequence. Hence the resulting string is ?\\?
.
For the second case '?\\\\?'
, the first backslash escapes the second, and the third backslash escapes the fourth which results in the string ?\\?
.
So that's why three backslashes is the same as four:
>>> '?\\\?' == '?\\\\?'
True
If you want to create a string with 3 backslashes you can escape each backslash:
>>> '?\\\\\\?'
'?\\\\\\?'
>>> print('?\\\\\\?')
?\\\?
or you might find "raw" strings more understandable:
>>> r'?\\\?'
'?\\\\\\?'
>>> print(r'?\\\?')
?\\\?
This turns of escape sequence processing for the string literal. See String Literals for more details.
Solution 3:
Because \x
in a character string, when x
is not one of the special backslashable characters like n
, r
, t
, 0
, etc, evaluates to a string with a backslash and then an x
.
>>> '\?'
'\\?'