What is this character: '*'?
The paste failed not because of the asterisk, which is a perfectly regular asterisk, but because of the Unicode character U+200B. As the character is a ZERO WIDTH SPACE
, it does not display when it is copied.
Using the Python code:
stro=u"'*'?"
def uniconv(text):
return " ".join(hex(ord(char)) for char in text)
uniconv(stro)
The function uniconv
converts the input string (in this case, u"'*'?"
) into their Unicode codepage equivalents in hexadecimal format. The u
prefix to the string identifies the string as a Unicode string.
I was able to obtain the output:
0x27 0x2a 0x200b 0x27 0x3f
We can clearly see that 0x27
, 0x2a
and 0x3f
are the ASCII/Unicode hexadecimal values for the characters '
,*
and ?
respectively. That leaves 0x200b
, therefore identifying the character.
Note that the Python code, when pasted into the body, had the U+200B character removed by SE's Markdown software. In order to obtain the expected result, you need to copy it directly from the title using the Edit view.
With the help of @Rinzwind in the Ask Ubuntu chat room, I figured out that the problem isn't the character at all. Note the output of od
:
$ printf '*' | od -c
0000000 * 342 200 213
0000004
The 342 200 213
is an octal representation of another character and we can use this site to look it up:
Character
Character name ZERO WIDTH SPACE
Hex code point 200B
Decimal code point 8203
Hex UTF-8 bytes E2 80 8B
Octal UTF-8 bytes 342 200 213
UTF-8 bytes as Latin-1 characters bytes â <80> <8B>
So, what I actually had was two unicode characters, the normal *
and a zero width space.