Sorting numerically in a comma-delimited file with Unix
I've got a comma-separated file that looks like this:
100,00869184
6492,8361
1234,31
200,04071
I want to use sort
to sort this file numerically by the first column only.
Desired Result:
100,00869184
200,04071
1234,31
6492,8361
How do I achieve this using sort? It seems like the commas are being treated like thousands separators instead of delimiters even when I call them out as such.
Both sort -t',' -n
and sort -t',' -nk1'
give me this:
1234,31
200,04071
6492,8361
100,00869184
Sorting by the default (no parameters) or using sort -t','
gives me this:
100,00869184
1234,31
200,04071
6492,8361
And sorting as a number sort -n
gives me this:
1234,31
200,04071
6492,8361
100,00869184
How can I use sort to achieve my desired result?
Edited to add: This is for a one-time operation to create a sorted list of about 7 million lines, so workarounds or other unorthodox methods are perfectly acceptable.
This is certainly a dirty workaround, but I figured out a way to do this thanks to @slhck's tip about locales. If a better answer comes along that would be more helpful to others, I'll certainly accept it since this pretty much only works for my specific problem.
I set the locale to Spanish (Bolivian) so that the commas were treated like decimal points, then standard numeric sorting did the trick.
$ export LC_NUMERIC="es_BO.utf8"
$ cat test.csv
100,00869184
6492,8361
1234,31
200,04071
$ sort -n test.csv
100,00869184
200,04071
1234,31
6492,8361
GNU's sort
does this by default:
$ cat test
100,00869184
6492,8361
1234,31
200,04071
$ gsort -nt',' < test
100,00869184
200,04071
1234,31
6492,8361
Version:
$ gsort --version
sort (GNU coreutils) 8.19
There's a caveat though: If your sorting does not work as expected, then your locale
is probably set to something different than C
. Why is this? locale
defines sorting and interpretation of letters, numbers, decimal characters et cetera.
To check this, just enter locale
in a Terminal. Is LC_NUMERIC
set to en_US.UTF-8
, maybe? This would explain the wrong sort order. Set it back to C
:
export LC_NUMERIC=C
Then, try your sort
command again. If you want to set your global locale
to C
, do this with:
export LC_ALL=C
Try adding the -g
option which is suppose to perform numeric sorting.
Try:
sort -t',' -g <whatever>