Wrong behavior of sort command?
I tried to sort the content of a file in Ubuntu desktop 14.04 (Trusty Tahr). In my case, the expected result should be same as original content, but the actual result is not. Why?
# cat test.txt
a++-a
a++-b
a++-c
ab
ac
# cat test.txt | sort
a++-a
ab
a++-b
ac
a++-c
Solution 1:
You could use LC_ALL
variable, set it to LC_ALL=C
before calling sort
$ LC_ALL=C sort test.txt
a++-a
a++-b
a++-c
ab
ac
Read this answer, if you want to know what is this magically LC_ALL=C
. Here is short summary:
The C locale is a special locale that is meant to be the simplest locale. You could also say that while the other locales are for humans, the C locale is for computers. In the C locale, characters are single bytes, the charset is ASCII, the sorting order is based on the byte values.
Also, as @KenMollerup pointed, quote from man sort
*** WARNING *** The locale specified by the environment affects sort
order. Set LC_ALL=C to get the traditional sort order that uses native
byte values.
So when using sort with LC_ALL=C
, sort compare symbols bytewise. Otherwise sort
will ignore all non alphanumerical characters.
Solution 2:
Sort uses alphabetical and numeric sorting, same as us, special characters like + - < > ... are ignored, numbers are treated numerical so 1, 2, 3.. comes before 11, 12 1066 1104 -- see!
So your list is seen as: aa, ab, ab, ac, ac