Removing u in list
That 'u' is part of the external representation of the string, meaning it's a Unicode string as opposed to a byte string. It's not in the string, it's part of the type.
As an example, you can create a new Unicode string literal by using the same synax. For instance:
>>> sandwich = u"smörgås"
>>> sandwich
u'sm\xf6rg\xe5s'
This creates a new Unicode string whose value is the Swedish word for sandwich. You can see that the non-English characters are represented by their Unicode code points, ö is \xf6
and å is \xe5
. The 'u' prefix appears just like in your example to signify that this string holds Unicode text.
To get rid of those, you need to encode the Unicode string into some byte-oriented representation, such as UTF-8. You can do that with e.g.:
>>> sandwich.encode("utf-8")
'sm\xc3\xb6rg\xc3\xa5s'
Here, we get a new string without the prefix 'u', since this is a byte string. It contains the bytes representing the characters of the Unicode string, with the Swedish characters resulting in multiple bytes due to the wonders of the UTF-8 encoding.
arr = [str(r) for r in arr]
This basically converts all your elements in string. Hence removes the encoding. Hence the u which represents encoding gets removed Will do the work easily and efficiently
The u means the strings are unicode. Translate all the strings to ascii to get rid of it:
a.encode('ascii', 'ignore')
u'AB'
is just a text representation of the corresponding Unicode string. Here're several methods that create exactly the same Unicode string:
L = [u'AB', u'\x41\x42', u'\u0041\u0042', unichr(65) + unichr(66)]
print u", ".join(L)
Output
AB, AB, AB, AB
There is no u''
in memory. It is just the way to represent the unicode
object in Python 2 (how you would write the Unicode string literal in a Python source code). By default print L
is equivalent to print "[%s]" % ", ".join(map(repr, L))
i.e., repr()
function is called for each list item:
print L
print "[%s]" % ", ".join(map(repr, L))
Output
[u'AB', u'AB', u'AB', u'AB']
[u'AB', u'AB', u'AB', u'AB']
If you are working in a REPL then a customizable sys.displayhook
is used that calls repr()
on each object by default:
>>> L = [u'AB', u'\x41\x42', u'\u0041\u0042', unichr(65) + unichr(66)]
>>> L
[u'AB', u'AB', u'AB', u'AB']
>>> ", ".join(L)
u'AB, AB, AB, AB'
>>> print ", ".join(L)
AB, AB, AB, AB
Don't encode to bytes. Print unicode directly.
In your specific case, I would create a Python list and use json.dumps()
to serialize it instead of using string formatting to create JSON text:
#!/usr/bin/env python2
import json
# ...
test = [dict(email=player.email, gem=player.gem)
for player in players]
print test
print json.dumps(test)
Output
[{'email': u'[email protected]', 'gem': 0}, {'email': u'test', 'gem': 0}, {'email': u'test', 'gem': 0}, {'email': u'test', 'gem': 0}, {'email': u'test', 'gem': 0}, {'email': u'test1', 'gem': 0}]
[{"email": "[email protected]", "gem": 0}, {"email": "test", "gem": 0}, {"email": "test", "gem": 0}, {"email": "test", "gem": 0}, {"email": "test", "gem": 0}, {"email": "test1", "gem": 0}]