How to find out if Python is compiled with UCS-2 or UCS-4?
When built with --enable-unicode=ucs4:
>>> import sys
>>> print sys.maxunicode
1114111
When built with --enable-unicode=ucs2:
>>> import sys
>>> print sys.maxunicode
65535
It's 0xFFFF (or 65535) for UCS-2, and 0x10FFFF (or 1114111) for UCS-4:
Py_UNICODE
PyUnicode_GetMax(void)
{
#ifdef Py_UNICODE_WIDE
return 0x10FFFF;
#else
/* This is actually an illegal character, so it should
not be passed to unichr. */
return 0xFFFF;
#endif
}
The maximum character in UCS-4 mode is defined by the maxmimum value representable in UTF-16.
I had this same issue once. I documented it for myself on my wiki at
http://arcoleo.org/dsawiki/Wiki.jsp?page=Python%20UTF%20-%20UCS2%20or%20UCS4
I wrote -
import sys
sys.maxunicode > 65536 and 'UCS4' or 'UCS2'
sysconfig will tell the unicode size from the configuration variables of python.
The buildflags can be queried like this.
Python 2.7:
import sysconfig
sysconfig.get_config_var('Py_UNICODE_SIZE')
Python 2.6:
import distutils
distutils.sysconfig.get_config_var('Py_UNICODE_SIZE')
I had the same issue and found a semi-official piece of code that does exactly that and may be interesting for people with the same issue: https://bitbucket.org/pypa/wheel/src/cf4e2d98ecb1f168c50a6de496959b4a10c6b122/wheel/pep425tags.py?at=default&fileviewer=file-view-default#pep425tags.py-83:89.
It comes from the wheel project which needs to check if the python is compiled with ucs-2 or ucs-4 because it will change the name of the binary file generated.