Python __str__ versus __unicode__
Solution 1:
__str__()
is the old method -- it returns bytes. __unicode__()
is the new, preferred method -- it returns characters. The names are a bit confusing, but in 2.x we're stuck with them for compatibility reasons. Generally, you should put all your string formatting in __unicode__()
, and create a stub __str__()
method:
def __str__(self):
return unicode(self).encode('utf-8')
In 3.0, str
contains characters, so the same methods are named __bytes__()
and __str__()
. These behave as expected.
Solution 2:
If I didn't especially care about micro-optimizing stringification for a given class I'd always implement __unicode__
only, as it's more general. When I do care about such minute performance issues (which is the exception, not the rule), having __str__
only (when I can prove there never will be non-ASCII characters in the stringified output) or both (when both are possible), might help.
These I think are solid principles, but in practice it's very common to KNOW there will be nothing but ASCII characters without doing effort to prove it (e.g. the stringified form only has digits, punctuation, and maybe a short ASCII name;-) in which case it's quite typical to move on directly to the "just __str__
" approach (but if a programming team I worked with proposed a local guideline to avoid that, I'd be +1 on the proposal, as it's easy to err in these matters AND "premature optimization is the root of all evil in programming";-).
Solution 3:
With the world getting smaller, chances are that any string you encounter will contain Unicode eventually. So for any new apps, you should at least provide __unicode__()
. Whether you also override __str__()
is then just a matter of taste.