UTF-8 In Python logging, how?
Having code like:
raise Exception(u'щ')
Caused:
File "/usr/lib/python2.7/logging/__init__.py", line 467, in format
s = self._fmt % record.__dict__
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
This happens because the format string is a byte string, while some of the format string arguments are unicode strings with non-ASCII characters:
>>> "%(message)s" % {'message': Exception(u'\u0449')}
*** UnicodeEncodeError: 'ascii' codec can't encode character u'\u0449' in position 0: ordinal not in range(128)
Making the format string unicode fixes the issue:
>>> u"%(message)s" % {'message': Exception(u'\u0449')}
u'\u0449'
So, in your logging configuration make all format string unicode:
'formatters': {
'simple': {
'format': u'%(asctime)-s %(levelname)s [%(name)s]: %(message)s',
'datefmt': '%Y-%m-%d %H:%M:%S',
},
...
And patch the default logging
formatter to use unicode format string:
logging._defaultFormatter = logging.Formatter(u"%(message)s")
Check that you have the latest Python 2.6 - some Unicode bugs were found and fixed since 2.6 came out. For example, on my Ubuntu Jaunty system, I ran your script copied and pasted, removing only the '/home/ted/' prefix from the log file name. Result (copied and pasted from a terminal window):
vinay@eta-jaunty:~/projects/scratch$ python --version Python 2.6.2 vinay@eta-jaunty:~/projects/scratch$ python utest.py printed unicode object: ô vinay@eta-jaunty:~/projects/scratch$ cat logfile.txt ô vinay@eta-jaunty:~/projects/scratch$
On a Windows box:
C:\temp>python --version Python 2.6.2 C:\temp>python utest.py printed unicode object: ô
And the contents of the file:
This might also explain why Lennart Regebro couldn't reproduce it either.
I had a similar problem running Django in Python3: My logger died upon encountering some Umlauts (äöüß) but was otherwise fine. I looked through a lot of results and found none working. I tried
import locale;
if locale.getpreferredencoding().upper() != 'UTF-8':
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
which I got from the comment above. It did not work. Looking at the current locale gave me some crazy ANSI thing, which turned out to mean basically just "ASCII". That sent me into totally the wrong direction.
Changing the logging format-strings to Unicode would not help. Setting a magic encoding comment at the beginning of the script would not help. Setting the charset on the sender's message (the text came from a HTTP-reqeust) did not help.
What DID work was setting the encoding on the file-handler to UTF-8 in settings.py
. Because I had nothing set, the default would become None
. Which apparently ends up being ASCII (or as I'd like to think about: ASS-KEY)
'handlers': {
'file': {
'level': 'DEBUG',
'class': 'logging.handlers.TimedRotatingFileHandler',
'encoding': 'UTF-8', # <-- That was missing.
....
},
},
I'm a little late, but I just came across this post that enabled me to set up logging in utf-8 very easily
Here the link to the post
or here the code:
root_logger= logging.getLogger()
root_logger.setLevel(logging.DEBUG) # or whatever
handler = logging.FileHandler('test.log', 'w', 'utf-8') # or whatever
formatter = logging.Formatter('%(name)s %(message)s') # or whatever
handler.setFormatter(formatter) # Pass handler as a parameter, not assign
root_logger.addHandler(handler)