Unicode identifiers in Python?
I want to build a Python function that calculates,
and would like to name my summation function Σ. In a similar fashion, would like to use Π for product, and so on. I was wondering if there was a way to name a python function in this fashion?
def Σ (..):
..
..
That is, does Python support unicode identifiers, and if so, could someone provide an example for it?
Thanks!
Original motivation for this was a piece of Clojure code I saw today that looks like,
(defn entropy [X]
(* -1 (Σ [i X] (* (p i) (log (p i))))))
where Σ is a macro defined as,
(defmacro Σ
... )
and I thought that was pretty cool.
BTW, to address a couple of comments about readability - with a lot of stats/ML code for instance, being able to compose operations with symbols would be really helpful. (Especially for really complex integrals et al)
φ(z) = ∫(N(x|0,1,1), -∞, z)
vs
Phi(z) = integral(N(x|0,1,1), -inf, z)
or even just the lambda character for lambda()!
Solution 1:
(I think it’s pretty cool too, that might mean we’re geeks.)
You’re fine to do this with the code you have above in Python 3. (It works in my Python 3.1 interpreter at least.) See:
- http://docs.python.org/py3k/reference/lexical_analysis.html#identifiers
- http://www.python.org/dev/peps/pep-3131/
But in Python 2, identifiers can only be ASCII letters, numbers and underscores.
- http://docs.python.org/reference/lexical_analysis.html#identifiers
Solution 2:
It's worth pointing out that Python 3 does support Unicode identifiers, but only allows letter or number like symbols (see http://docs.python.org/3.3/reference/lexical_analysis.html#identifiers for full details). That's why Σ works (remember that it's a Greek letter, not just a math symbol), but √ doesn't.
Solution 3:
(this answer is meant to be a minor addendum not a complete answer)
The additional gotcha to unicode identifiers (which @mike-desimone mentions and I discovered quickly when I thought this was a cool thread and switched to a terminal to play with it), is the multiple versions of each glyph are not equivalent, with regards to how you get to each glyph on each platform. For example Σ (aka greek capital letter sigma, aka U+03A3, [can't find a direct mac input method]) is fine, but unfortunately ∑ (aka N-ary Summation, aka U+2211, aka opt/alt-w using Mac OS X) is not a valid identifier.
>>> Σ = 20
>>> Σ
20
but
>>> ∑ = 20
File "<input>", line 1
∑ = 20
^
SyntaxError: invalid character in identifier
Using Σ specifically (and probably unicode chars in general) as an identifier might generate some very hard to diagnose errors if you have multiple developers on multiple platforms contributing to your code, for example, debug this visually:
The two glyphs are easier to differentiate on this page, but depending on the font used, this may not be the case.
Even the traceback isn't much clearer unless Σ is printed near the ∑
File "~/Dev/play_python33/identifiers.py", line 12
print(∑([2, 2, 2, 2, 2]))
^
SyntaxError: invalid character in identifier
Solution 4:
According to is it bad, you can use some unicode characters, but not all: You are restricted to characters identified as letters.
>>> α = 3
>>> Σ = sum
>>> import math
>>> √ = math.sqrt
File "<stdin>", line 1
√ = 3
^
SyntaxError: invalid character in identifier
Besides: I think it is very cool to be able to use unicode as identifiers - and I wish, i could use all.
I use the neo keyboard layout, which gives me greek and math symbols on extra layers:
αβχδεφγψιθκλνοπϕστ[&ωξυζ
∀⇐ℂΔ∃ΦΓΨ∫Λ⇔Σ∈ℚℝ∂⊂√∩Ξ
Solution 5:
Python 2.x does not support unicode identifiers, and consequently does not support Σ as an identifier. Python 3.x does support unicode identifiers, although many people will get cross if they have to edit source files with, for example, identifiers A and Α (latin A and greek capital alpha.) Sigma is often readable enough, but still, not as readable as the word sigma, so why bother?