sort command : -g versus -n flag

Solution 1:

The major difference is in the treatment of numbers that are in scientific notation. From info sort, when using the -n (numeric) sort

 Neither a leading `+' nor exponential notation is recognized.  To
 compare such strings numerically, use the `--general-numeric-sort'
 (`-g') option.

So, for example, given

$ cat file
+1.23e-1
1.23e-2
1.23e-3
1.23e4
1.23e+5
-1.23e6

then

$ sort -n file
-1.23e6
+1.23e-1
1.23e-2
1.23e-3
1.23e4
1.23e+5

whereas

$ sort -g file
-1.23e6
1.23e-3
1.23e-2
+1.23e-1
1.23e4
1.23e+5

Solution 2:

From sort info page, sort -g is explained by these

‘-g’
‘--general-numeric-sort’
‘--sort=general-numeric’
     Sort numerically, converting a prefix of each line to a long
     double-precision floating point number.  *Note Floating point::.
     Do not report overflow, underflow, or conversion errors.  Use the
     following collating sequence:

        • Lines that do not start with numbers (all considered to be
          equal).
        • NaNs (“Not a Number” values, in IEEE floating point
          arithmetic) in a consistent but machine-dependent order.
        • Minus infinity.
        • Finite numbers in ascending numeric order (with -0 and +0
          equal).
        • Plus infinity.

     Use this option only if there is no alternative; it is much slower
     than ‘--numeric-sort’ (‘-n’) and it can lose information when
     converting to floating point.

sort -n is the natural sort we usually expect

‘-n’
‘--numeric-sort’
‘--sort=numeric’
     Sort numerically.  The number begins each line and consists of
     optional blanks, an optional ‘-’ sign, and zero or more digits
     possibly separated by thousands separators, optionally followed by
     a decimal-point character and zero or more digits.  An empty number
     is treated as ‘0’.  The ‘LC_NUMERIC’ locale specifies the
     decimal-point character and thousands separator.  By default a
     blank is a space or a tab, but the ‘LC_CTYPE’ locale can change
     this.

     Comparison is exact; there is no rounding error.

     Neither a leading ‘+’ nor exponential notation is recognized.  To
     compare such strings numerically, use the ‘--general-numeric-sort’
     (‘-g’) option.

Check Steeldriver's answer for a better explanation.

Solution 3:

From the sort manual:

‘-n’
‘--numeric-sort’
‘--sort=numeric’

Sort numerically. The number begins each line and consists of optional blanks, an optional ‘-’ sign, and zero or more digits possibly separated by thousands separators, optionally followed by a decimal-point character and zero or more digits. An empty number is treated as ‘0’. The LC_NUMERIC locale specifies the decimal-point character and thousands separator. By default a blank is a space or a tab, but the LC_CTYPE locale can change this.

Comparison is exact; there is no rounding error.

Neither a leading ‘+’ nor exponential notation is recognized. To compare such strings numerically, use the --general-numeric-sort (-g) option.

And;

‘-g’
‘--general-numeric-sort’
‘--sort=general-numeric’

Sort numerically, converting a prefix of each line to a long double-precision floating point number. See Floating point. Do not report overflow, underflow, or conversion errors. Use the following collating sequence:

Lines that do not start with numbers (all considered to be equal).

NaNs (“Not a Number” values, in IEEE floating point arithmetic) in a consistent but machine-dependent order.

Minus infinity.

Finite numbers in ascending numeric order (with -0 and +0 equal).

Plus infinity.

Use this option only if there is no alternative; it is much slower than --numeric-sort (-n) and it can lose information when converting to floating point.

So, it would seem that using -g could result in incorrect comparisons due to loss of precision, but for whatever reason, I can't produce such a result:

$ printf "%s\n" 1 1.23 1.234 1.2345 1.23456 1.234567 1.2345678 1.23456789 1.23456788888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888 1.23456788888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888878888888888 | sort -g
1
1.23
1.234
1.2345
1.23456
1.234567
1.2345678
1.23456788888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888878888888888
1.23456788888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888
1.23456789

sort -g correctly places the second long fraction before the first, but the difference between the two is well beyond the precision from a double:

$ cat test.cpp  
#include<iostream>

using namespace std;

int main()
{
    cout << (1.23456788888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888887888888888888888888888 < 1.23456788888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888) << endl;
    cout << (1.23456788888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888887888888888888888888888 > 1.23456788888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888) << endl;
}
$ make test     
g++     test.cpp   -o test
$ ./test        
0
0

sort command : -g versus -n flag

Solution 1:

Solution 2:

Solution 3:

Related

Recent Posts