Explain the effects of export LANG, LC_CTYPE, and LC_ALL [closed]
I've just installed Linux Mint 17 and faced a problem that I couldn't use the Russian language in the terminal. (I see ?
instead of letters.)
On one forum I found this solution:
Added in ~/.profile:
export LANG=ru_RU.UTF-8
export LC_CTYPE=ru_RU.UTF-8
export LC_ALL=ru_RU.UTF-8
It helped, but also changed my interface language to Russian (which I didn't want). That's not even a problem, but anyway, I would like to know, how this code works (every line).
I'll explain with detail:
export LANG=ru_RU.UTF-8
That is a shell command that will export an environment variable named LANG
with the given value ru_RU.UTF-8
. That instructs internationalized programs to use the Russian language (ru
), variant from Russia (RU
), and the UTF-8
encoding for console output.
Generally this single line is enough.
This other one:
export LC_CTYPE=ru_RU.UTF-8
Does a similar thing, but it tells the program not to change the language, but only the CTYPE to Russian. If a program can change a text to uppercase, then it will use the Russian rules to do so, even though the text itself may be in English.
It is worth saying that mixing LANG
and LC_CTYPE
can give unexpected results, because few people do that, so it is quite untested, unless maybe:
export LANG=ru_RU.UTF-8
export LC_CTYPE=C
That will make the program output in Russian, but the CTYPE standard old C style.
The last line, LC_ALL
is a last resort override, that will make the program ignore all the other LC_*
variables and use this. I think that you should never write it in a profile line, but use it to run a program in a given language. For example, if you want to write a bug report, and you don't want any kind of localized output, and you don't know which LC_*
variables are set:
LC_ALL=C program
About changing the language of all your programs or only the console, that depends on where you put these lines. I put mine in ~/.bashrc
so they don't apply to the GUI, only to the bash consoles.
See at the Environment Variables of UNIX Specification page:
LANG
This variable determines the locale category for native language, local customs and coded character set in the absence of theLC_ALL
and otherLC_*
(LC_COLLATE
,LC_CTYPE
,LC_MESSAGES
,LC_MONETARY
,LC_NUMERIC
,LC_TIME
) environment variables. This can be used by applications to determine the language to use for error messages and instructions, collating sequences, date formats, and so forth.
LC_ALL
This variable determines the values for all locale categories. The value of theLC_ALL
environment variable has precedence over any of the other environment variables starting with LC_ (LC_COLLATE,LC_CTYPE
,LC_MESSAGES
,LC_MONETARY
,LC_NUMERIC
,LC_TIME
) and theLANG
environment variable.
LC_CTYPE
This variable determines the locale category for character handling functions, such astolower()
,toupper()
andisalpha()
. This environment variable determines the interpretation of sequences of bytes of text data as characters (for example, single- as opposed to multi-byte characters), the classification of characters (for example, alpha, digit, graph) and the behaviour of character classes. Additional semantics of this variable, if any, are implementation-dependent.
The LANG
, LC_CTYPE
and LC_ALL
are special environment variables which after they got exported to the shell environment (help export
), they are available and ready to be read by certain programs which supports a locale (natural language formatting for C).
Each variable sets the C library's notion of natural language formatting style for particular sets of routines, for example:
LC_ALL
- Set the entire locale genericallyLC_CTYPE
- Set a locale for thectype
andmultibyte
functions. This controls recognition of upper and lower case, alphabetic or non-alphabetic characters, and so on.
and other such as LC_COLLATE
(for string collation routines), LC_MESSAGES
(for message catalogs), LC_MONETARY
(for formatting monetary values), LC_NUMERIC
(for formatting numbers), LC_TIME
(for formatting dates and times).
Regarding LANG
, it is used as a substitute for any unset LC_*
variable (see: man locale
).
See: man setlocale
(BSD), man locale
So when certain C functions are called (such as setlocale
, ctype
, multibyte
, catopen
, printf
, etc.), they read the locale settings from the configuration files and local environment in order to control and format natural language formatting style as per C programming language standards (see: ISO C99)
See also: C Library - <locale.h>.
export
is confusing. It really means mark-for-export
.
It implies child processes will later be created, and that's when the actual exporting will be done.
The export
order of events is: 1-ASSIGN, MARK, and ... 2-FORK.
1) Create a new local shell variable, assign the value to it, and mark this variable for later export.
2) Then if and when, the current shell script is FORKED, (i.e. to create and run any child-processes), then start a child process with a COPY of this exported variable, as one of it's many environment variables.
nb (note well): Not until step 2, and possibly long after the export
declaration was issued, does the variable actually get exported. So: export
only marks LANG. It does not export LANG.
By convention, exported variables are named in upper case.
Because LANG is only a copy, if the child later modifies this variable, it only modifies it for itself. The parent doesn't see the child's modifications.
Note that there are also many other environment variables passed to child processes from parent processes. These include all of the other environment variables that the parent process also gets from it's parent.
So the child inherits all of the parent's environment variables,
+ any additional ones that the parent marks for export
,
- less any variables which are explicitly unset
.
In other words, we have two processes to think about: the parent process and any future child process(es).
The process you're running, in this case profile
, is what we're calling the 'parent process'.
profile
can spawn one or more child processes, like for example if one of the things you do in profile is to run a program. That program is then (normally) run as a child process of profile
. (This is not true if the file is sourced in profile, using the . <name>
or source <name>
notation, where what is sourced runs in the same process as profile
.)
export LANG=ru_RU.UTF-8 export LC_CTYPE=ru_RU.UTF-8 export LC_ALL=ru_RU.UTF-8
So now let's look at the effects of these three environment variables.
LANG is what a user normally sets to affect the language that a program runs in. When in terminal if you enter env | grep LANG
you should see that LANG is set to your <language>_<country-code>.<character-encoding>
, e.g. LANG=en_US.UTF-8.
LC_CTYPE is an override to LANG, and overrides just the character set used. All other features (categories) of LANG are still used as set by LANG, e.g. LC_TELEPHONE.
LC_ALL is a further override. It overrides both LC_CTYPE and all locale categories that were set by LANG to a given language and codeset. Note that LC_ALL should never be set persistently, like for a profile itself. It is intended only as a temporarily entire locale override, i.e. it overrides all categories, like LC_TELEPHONE, LC_MONETARY, LC_CTYPE, etc.