Does Python forbid two similarly looking Unicode identifiers?
I was playing around with Unicode identifiers and stumbled upon this:
>>> π, x = 1, 2
>>> π, x
(1, 2)
>>> π, f = 1, 2
>>> π, f
(2, 2)
What's going on here? Why does Python replace the object referenced by π
, but only sometimes? Where is that behavior described?
Solution 1:
PEP 3131 -- Supporting Non-ASCII Identifiers says
All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC.
You can use unicodedata
to test the conversions:
import unicodedata
unicodedata.normalize('NFKC', 'π')
# f
which would indicate that 'π'
gets converted to 'f'
in parsing. Leading to the expected:
π = "Some String"
print(f)
# "Some String"
Solution 2:
Here's a small example, just to show how horrible this "feature" is:
ππ‘α΅’π°_ο½π’π’ππα΅£β_π€βπ¬π²ππ‘_dβπα΅’π―ο½π΅πβy_π·π¦_π_πα΅g = 42
print(Tπ΅βΉπ_πeπππͺα΅£e_βπ₯ΒΊπΎπΉπ_πeπα΅’πβ±ο½α΅ππ_π»β―_π_πππ°)
# => 42
Try it online! (But please don't use it)
And as mentioned by @MarkMeyer, two identifiers might be distinct even though they look just the same ("CYRILLIC CAPITAL LETTER A" and "LATIN CAPITAL LETTER A")
Π = 42
print(A)
# => NameError: name 'A' is not defined