Python raw strings and trailing backslash
I ran across something once upon a time and wondered if it was a Python "bug" or at least a misfeature. I'm curious if anyone knows of any justifications for this behavior. I thought of it just now reading "Code Like a Pythonista," which has been enjoyable so far. I'm only familiar with the 2.x line of Python.
Raw strings are strings that are prefixed with an r
. This is great because I can use backslashes in regular expressions and I don't need to double everything everywhere. It's also handy for writing throwaway scripts on Windows, so I can use backslashes there also. (I know I can also use forward slashes, but throwaway scripts often contain content cut&pasted from elsewhere in Windows.)
So great! Unless, of course, you really want your string to end with a backslash. There's no way to do that in a 'raw' string.
In [9]: r'\n'
Out[9]: '\\n'
In [10]: r'abc\n'
Out[10]: 'abc\\n'
In [11]: r'abc\'
------------------------------------------------
File "<ipython console>", line 1
r'abc\'
^
SyntaxError: EOL while scanning string literal
In [12]: r'abc\\'
Out[12]: 'abc\\\\'
So one backslash before the closing quote is an error, but two backslashes gives you two backslashes! Certainly I'm not the only one that is bothered by this?
Thoughts on why 'raw' strings are 'raw, except for backslash-quote'? I mean, if I wanted to embed a single quote in there I'd just use double quotes around the string, and vice versa. If I wanted both, I'd just triple quote. If I really wanted three quotes in a row in a raw string, well, I guess I'd have to deal, but is this considered "proper behavior"?
This is particularly problematic with folder names in Windows, where the backslash is the path delimeter.
It's a FAQ.
And in response to "you really want your string to end with a backslash. There's no way to do that in a 'raw' string.": the FAQ shows how to workaround it.
>>> r'ab\c' '\\' == 'ab\\c\\'
True
>>>
Raw strings are meant mostly for readably writing the patterns for regular expressions, which never need a trailing backslash; it's an accident that they may come in handy for Windows (where you could use forward slashes in most cases anyway -- the Microsoft C library which underlies Python accepts either form!). It's not cosidered acceptable to make it (nearly) impossible to write a regular expression pattern containing both single and double quotes, just to reinforce the accident in question.
("Nearly" because triple-quoting would almost alway help... but it could be a little bit of a pain sometimes).
So, yes, raw strings were designed to behave that way (forbidding odd numbers of trailing backslashes), and it is considered perfectly "proper behavior" for them to respect the design decisions Guido made when he invented them;-).
Another way to workaround this is:
>>> print(r"Raw \with\ trailing backslash\ "[:-1])
Raw \with\ trailing backslash\
Updated for Python 3 and removed unnecessary slash at the end which implied an escape.
Note that personally I doubt I would use the above. I guess maybe if it was a huge string with more than just a path. For the above I'd prefer non-raw and double up the slashes.