How many GCC optimization levels are there?
To be pedantic, there are 8 different valid -O options you can give to gcc, though there are some that mean the same thing.
The original version of this answer stated there were 7 options. GCC has since added -Og
to bring the total to 8
From the man page:
-
-O
(Same as-O1
) -
-O0
(do no optimization, the default if no optimization level is specified) -
-O1
(optimize minimally) -
-O2
(optimize more) -
-O3
(optimize even more) -
-Ofast
(optimize very aggressively to the point of breaking standard compliance) -
-Og
(Optimize debugging experience. -Og enables optimizations that do not interfere with debugging. It should be the optimization level of choice for the standard edit-compile-debug cycle, offering a reasonable level of optimization while maintaining fast compilation and a good debugging experience.) -
-Os
(Optimize for size.-Os
enables all-O2
optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size.-Os
disables the following optimization flags:-falign-functions -falign-jumps -falign-loops -falign-labels -freorder-blocks -freorder-blocks-and-partition -fprefetch-loop-arrays -ftree-vect-loop-version
)
There may also be platform specific optimizations, as @pauldoo notes, OS X has -Oz
Let's interpret the source code of GCC 5.1
We'll try to understand what happens on -O100
, since it is not clear on the man page.
We shall conclude that:
- anything above
-O3
up toINT_MAX
is the same as-O3
, but that could easily change in the future, so don't rely on it. - GCC 5.1 runs undefined behavior if you enter integers larger than
INT_MAX
. - the argument can only have digits, or it fails gracefully. In particular, this excludes negative integers like
-O-1
Focus on subprograms
First remember that GCC is just a front-end for cpp
, as
, cc1
, collect2
. A quick ./XXX --help
says that only collect2
and cc1
take -O
, so let's focus on them.
And:
gcc -v -O100 main.c |& grep 100
gives:
COLLECT_GCC_OPTIONS='-O100' '-v' '-mtune=generic' '-march=x86-64'
/usr/local/libexec/gcc/x86_64-unknown-linux-gnu/5.1.0/cc1 [[noise]] hello_world.c -O100 -o /tmp/ccetECB5.
so -O
was forwarded to both cc1
and collect2
.
O in common.opt
common.opt is a GCC specific CLI option description format described in the internals documentation and translated to C by opth-gen.awk and optc-gen.awk.
It contains the following interesting lines:
O
Common JoinedOrMissing Optimization
-O<number> Set optimization level to <number>
Os
Common Optimization
Optimize for space rather than speed
Ofast
Common Optimization
Optimize for speed disregarding exact standards compliance
Og
Common Optimization
Optimize for debugging experience rather than speed or size
which specify all the O
options. Note how -O<n>
is in a separate family from the other Os
, Ofast
and Og
.
When we build, this generates a options.h
file that contains:
OPT_O = 139, /* -O */
OPT_Ofast = 140, /* -Ofast */
OPT_Og = 141, /* -Og */
OPT_Os = 142, /* -Os */
As a bonus, while we are grepping for \bO\n
inside common.opt
we notice the lines:
-optimize
Common Alias(O)
which teaches us that --optimize
(double dash because it starts with a dash -optimize
on the .opt
file) is an undocumented alias for -O
which can be used as --optimize=3
!
Where OPT_O is used
Now we grep:
git grep -E '\bOPT_O\b'
which points us to two files:
- opts.c
- lto-wrapper.c
Let's first track down opts.c
opts.c:default_options_optimization
All opts.c
usages happen inside: default_options_optimization
.
We grep backtrack to see who calls this function, and we see that the only code path is:
main.c:main
toplev.c:toplev::main
opts-global.c:decode_opts
opts.c:default_options_optimization
and main.c
is the entry point of cc1
. Good!
The first part of this function:
- does
integral_argument
which callsatoi
on the string corresponding toOPT_O
to parse the input argument - stores the value inside
opts->x_optimize
whereopts
is astruct gcc_opts
.
struct gcc_opts
After grepping in vain, we notice that this struct
is also generated at options.h
:
struct gcc_options {
int x_optimize;
[...]
}
where x_optimize
comes from the lines:
Variable
int optimize
present in common.opt
, and that options.c
:
struct gcc_options global_options;
so we guess that this is what contains the entire configuration global state, and int x_optimize
is the optimization value.
255 is an internal maximum
in opts.c:integral_argument
, atoi
is applied to the input argument, so INT_MAX
is an upper bound. And if you put anything larger, it seem that GCC runs C undefined behaviour. Ouch?
integral_argument
also thinly wraps atoi
and rejects the argument if any character is not a digit. So negative values fail gracefully.
Back to opts.c:default_options_optimization
, we see the line:
if ((unsigned int) opts->x_optimize > 255)
opts->x_optimize = 255;
so that the optimization level is truncated to 255
. While reading opth-gen.awk
I had come across:
# All of the optimization switches gathered together so they can be saved and restored.
# This will allow attribute((cold)) to turn on space optimization.
and on the generated options.h
:
struct GTY(()) cl_optimization
{
unsigned char x_optimize;
which explains why the truncation: the options must also be forwarded to cl_optimization
, which uses a char
to save space. So 255 is an internal maximum actually.
opts.c:maybe_default_options
Back to opts.c:default_options_optimization
, we come across maybe_default_options
which sounds interesting. We enter it, and then maybe_default_option
where we reach a big switch:
switch (default_opt->levels)
{
[...]
case OPT_LEVELS_1_PLUS:
enabled = (level >= 1);
break;
[...]
case OPT_LEVELS_3_PLUS:
enabled = (level >= 3);
break;
There are no >= 4
checks, which indicates that 3
is the largest possible.
Then we search for the definition of OPT_LEVELS_3_PLUS
in common-target.h
:
enum opt_levels
{
OPT_LEVELS_NONE, /* No levels (mark end of array). */
OPT_LEVELS_ALL, /* All levels (used by targets to disable options
enabled in target-independent code). */
OPT_LEVELS_0_ONLY, /* -O0 only. */
OPT_LEVELS_1_PLUS, /* -O1 and above, including -Os and -Og. */
OPT_LEVELS_1_PLUS_SPEED_ONLY, /* -O1 and above, but not -Os or -Og. */
OPT_LEVELS_1_PLUS_NOT_DEBUG, /* -O1 and above, but not -Og. */
OPT_LEVELS_2_PLUS, /* -O2 and above, including -Os. */
OPT_LEVELS_2_PLUS_SPEED_ONLY, /* -O2 and above, but not -Os or -Og. */
OPT_LEVELS_3_PLUS, /* -O3 and above. */
OPT_LEVELS_3_PLUS_AND_SIZE, /* -O3 and above and -Os. */
OPT_LEVELS_SIZE, /* -Os only. */
OPT_LEVELS_FAST /* -Ofast only. */
};
Ha! This is a strong indicator that there are only 3 levels.
opts.c:default_options_table
opt_levels
is so interesting, that we grep OPT_LEVELS_3_PLUS
, and come across opts.c:default_options_table
:
static const struct default_options default_options_table[] = {
/* -O1 optimizations. */
{ OPT_LEVELS_1_PLUS, OPT_fdefer_pop, NULL, 1 },
[...]
/* -O3 optimizations. */
{ OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
[...]
}
so this is where the -On
to specific optimization mapping mentioned in the docs is encoded. Nice!
Assure that there are no more uses for x_optimize
The main usage of x_optimize
was to set other specific optimization options like -fdefer_pop
as documented on the man page. Are there any more?
We grep
, and find a few more. The number is small, and upon manual inspection we see that every usage only does at most a x_optimize >= 3
, so our conclusion holds.
lto-wrapper.c
Now we go for the second occurrence of OPT_O
, which was in lto-wrapper.c
.
LTO means Link Time Optimization, which as the name suggests is going to need an -O
option, and will be linked to collec2
(which is basically a linker).
In fact, the first line of lto-wrapper.c
says:
/* Wrapper to call lto. Used by collect2 and the linker plugin.
In this file, the OPT_O
occurrences seems to only normalize the value of O
to pass it forward, so we should be fine.