How to convert escaped characters?
I want to convert strings containing escaped characters to their normal form, the same way Python's lexical parser does:
>>> escaped_str = 'One \\\'example\\\''
>>> print(escaped_str)
One \'Example\'
>>> normal_str = normalize_str(escaped_str)
>>> print(normal_str)
One 'Example'
Of course the boring way will be to replace all known escaped characters one by one: http://docs.python.org/reference/lexical_analysis.html#string-literals
How would you implement normalize_str()
in the above code?
>>> escaped_str = 'One \\\'example\\\'' >>> print escaped_str.encode('string_escape') One \\\'example\\\' >>> print escaped_str.decode('string_escape') One 'example'
Several similar codecs are available, such as rot13 and hex.
The above is Python 2.x, but – since you said (below, in a comment) that you're using Python 3.x – while it's circumlocutious to decode a Unicode string object, it's still possible. The codec has been renamed to "unicode_escape" too:
Python 3.3a0 (default:b6aafb20e5f5, Jul 29 2011, 05:34:11) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> escaped_str = "One \\\'example\\\'" >>> import codecs >>> print(codecs.getdecoder("unicode_escape")(escaped_str)[0]) One 'example'
I assume the question is really:
I have a string that is formatted as if it were a part of Python source code. How can I safely interpret it so that
\n
within the string is transformed into a newline, quotation marks are expected on either end, etc. ?
Try ast.literal_eval
.
>>> import ast
>>> print ast.literal_eval(raw_input())
"hi, mom.\n This is a \"weird\" string, isn't it?"
hi, mom.
This is a "weird" string, isn't it?
For comparison, going the other way:
>>> print repr(raw_input())
"hi, mom.\n This is a \"weird\" string, isn't it?"
'"hi, mom.\\n This is a \\"weird\\" string, isn\'t it?"'
SingleNegationElimination already mentioned this, but here is an example:
In Python 3:
>>>escaped_str = 'One \\\'example\\\''
>>>print(escaped_str.encode('ascii', 'ignore').decode('unicode_escape'))
One 'example'