Is there a Unicode-aware LC_COLLATE sort order which respects punctuation?
As far as I can tell, setting the environment variable LC_COLLATE=en_US.utf8
changes four things compared to LC_COLLATE=c
, regarding how programs like ls
will sort files:
- Unicode characters are preserved (rather than being replaced with
??
garbage) - Accents and diacritical marks don't affect sort order
- Case differences don't affect sort order
- Punctuation characters (such as dots) don't affect sort order
Feature 1 is must-have in this day and age.
Features 2 and 3 are great too, since they make it more convenient to deal with real-life Unicode filenames.
Feature 4, on the other hand, is something that I find really anti-productive in my day-to-day work, since it often produces counter-intuitive sort orders for Linux filenames - where dots tend to be used to separate suffixes or to indicate dotfiles. I really can't imagine why anyone thought it would be a good idea to ignore dots when sorting filenames.
For example:
$ touch foo.txt foo2.txt foó3.txt foo4.txt
$ LC_COLLATE=en_US.utf8 ls
foo2.txt foó3.txt foo4.txt foo.txt
$ LC_COLLATE=c ls
foo.txt foo2.txt foo4.txt fo??3.txt
Neither is satisfactory. This is how I'd want those files to be sorted:
foo.txt foo2.txt foó3.txt foo4.txt
In other words, just like with LC_COLLATE=en_US.utf8
, except that punctuations are treated as significant characters (which are sorted before letters).
Does any LC_COLLATE setting exist which does this?
If there is no punctuation-respecting one that supports all features 1-3, is there at least one that supports feature 1 (i.e. sort like LC_COLLATE=c
but don't garble Unicode chars)?
Problem number 1 is that LC_COLLATE=c
is an invalid locale. You need a capital C
(LC_COLLATE=C
):
$ LC_COLLATE=c ls -1a
./
../
.sharp
.zharp
Sharp
sharp
szharp
zharp
??harp
$ LC_COLLATE=C ls -1a
./
../
.sharp
.zharp
Sharp
sharp
szharp
zharp
ßharp
I don't know how to do unicode-aware sorting without sorting filenames starting with a dot on top though (searching for an answer to this is how I ended up here) :-/