What precisely does the %g printf specifier mean?
The %g
specifier doesn't seem to behave in the way that most sources document it as behaving.
According to most sources I've found, across multiple languages that use printf
specifiers, the %g
specifier is supposed to be equivalent to either %f
or %e
- whichever would produce shorter output for the provided value. For instance, at the time of writing this question, cplusplus.com says that the g
specifier means:
Use the shortest representation:
%e
or%f
And the PHP manual says it means:
g - shorter of %e and %f.
And here's a Stack Overflow answer that claims that
%g
uses the shortest representation.
And a Quora answer that claims that:
%g
prints the number in the shortest of these two representations
But this behaviour isn't what I see in reality. If I compile and run this program (as C or C++ - it's a valid program with the same behaviour in both):
#include <stdio.h>
int main(void) {
double x = 123456.0;
printf("%e\n", x);
printf("%f\n", x);
printf("%g\n", x);
printf("\n");
double y = 1234567.0;
printf("%e\n", y);
printf("%f\n", y);
printf("%g\n", y);
return 0;
}
... then I see this output:
1.234560e+05
123456.000000
123456
1.234567e+06
1234567.000000
1.23457e+06
Clearly, the %g
output doesn't quite match either the %e
or %f
output for either x
or y
above. What's more, it doesn't look like %g
is minimising the output length either; y
could've been formatted more succinctly if, like x
, it had not been printed in scientific notation.
Are all of the sources I've quoted above lying to me?
I see identical or similar behaviour in other languages that support these format specifiers, perhaps because under the hood they call out to the printf
family of C functions. For instance, I see this output in Python:
>>> print('%g' % 123456.0)
123456
>>> print('%g' % 1234567.0)
1.23457e+06
In PHP:
php > printf('%g', 123456.0);
123456
php > printf('%g', 1234567.0);
1.23457e+6
In Ruby:
irb(main):024:0* printf("%g\n", 123456.0)
123456
=> nil
irb(main):025:0> printf("%g\n", 1234567.0)
1.23457e+06
=> nil
What's the logic that governs this output?
Solution 1:
This is the full description of the g
/G
specifier in the C11 standard:
A double argument representing a floating-point number is converted in style
f
ore
(or in styleF
orE
in the case of aG
conversion specifier), depending on the value converted and the precision. Let P equal the precision if nonzero, 6 if the precision is omitted, or 1 if the precision is zero. Then, if a conversion with styleE
would have an exponent of X:if P > X ≥ −4, the conversion is with style
f
(orF
) and precision P − (X + 1).
otherwise, the conversion is with stylee
(orE
) and precision P − 1.Finally, unless the # flag is used, any trailing zeros are removed from the fractional portion of the result and the decimal-point character is removed if there is no fractional portion remaining.
A double argument representing an infinity or NaN is converted in the style of an
f
orF
conversion specifier.
This behaviour is somewhat similar to simply using the shortest representation out of %f
and %e
, but not equivalent. There are two important differences:
- Trailing zeros (and, potentially, the decimal point) get stripped when using
%g
, which can cause the output of a%g
specifier to not exactly match what either%f
or%e
would've produced. - The decision about whether to use
%f
-style or%e
-style formatting is made based purely upon the size of the exponent that would be needed in%e
-style notation, and does not directly depend on which representation would be shorter. There are several scenarios in which this rule results in%g
selecting the longer representation, like the one shown in the question where%g
uses scientific notation even though this makes the output 4 characters longer than it needs to be.
In case the C standard's wording is hard to parse, the Python documentation provides another description of the same behaviour:
General format. For a given precision, this rounds the number to
p
>=
1
significant digits and then formats the result in either fixed-point format or in scientific notation, depending on its magnitude.
p
The precise rules are as follows: suppose that the result formatted with presentation type
and precision
'e'
would have exponent
p-1
. Then if
exp
, the number is formatted with presentation type
-4
<=
exp
<
p
and precision
'f'
. Otherwise, the number is formatted with presentation type
p-1-exp
and precision
'e'
. In both cases insignificant trailing zeros are removed from the significand, and the decimal point is also removed if there are no remaining digits following it.
p-1
Positive and negative infinity, positive and negative zero, and nans, are formatted as
A precision of,
inf
,
-inf
,
0
and
-0
respectively, regardless of the precision.
nan
is treated as equivalent to a precision of
0
. The default precision is
1
.
6
The many sources on the internet that claim that %g
just picks the shortest out of %e
and %f
are simply wrong.