GCC: how is march different from mtune?
I tried to scrub the GCC man page for this, but still don't get it, really.
What's the difference between -march
and -mtune
?
When does one use just -march
, vs. both? Is it ever possible to just -mtune
?
Solution 1:
If you use -march
then GCC will be free to generate instructions that work on the specified CPU, but (typically) not on earlier CPUs in the architecture family.
If you just use -mtune
, then the compiler will generate code that works on any of them, but will favour instruction sequences that run fastest on the specific CPU you indicated. e.g. setting loop-unrolling heuristics appropriately for that CPU.
-march=foo
implies -mtune=foo
unless you also specify a different -mtune
. This is one reason why using -march
is better than just enabling options like -mavx
without doing anything about tuning.
Caveat: -march=native
on a CPU that GCC doesn't specifically recognize will still enable new instruction sets that GCC can detect, but will leave -mtune=generic
. Use a new enough GCC that knows about your CPU if you want it to make good code.
Solution 2:
This is what i've googled up:
The -march=X
option takes a CPU name X
and allows GCC to generate code that uses all features of X
. GCC manual explains exactly which CPU names mean which CPU families and features.
Because features are usually added, but not removed, a binary built with -march=X
will run on CPU X
, has a good chance to run on CPUs newer than X
, but it will almost assuredly not run on anything older than X
. Certain instruction sets (3DNow!, i guess?) may be specific to a particular CPU vendor, making use of these will probably get you binaries that don't run on competing CPUs, newer or otherwise.
The -mtune=Y
option tunes the generated code to run faster on Y
than on other CPUs it might run on. -march=X
implies -mtune=X
. -mtune=Y
will not override -march=X
, so, for example, it probably makes no sense to -march=core2
and -mtune=i686
- your code will not run on anything older than core2
anyway, because of -march=core2
, so why on Earth would you want to optimize for something older (less featureful) than core2? -march=core2 -mtune=haswell
makes more sense: don't use any features beyond what core2
provides (which is still a lot more than what -march=i686
gives you!), but do optimize code for much newer haswell
CPUs, not for core2
.
There's also -mtune=generic
. generic
makes GCC produce code that runs best on current CPUs (meaning of generic
changes from one version of GCC to another). There are rumors on Gentoo forums that -march=X -mtune=generic
produces code that runs faster on X
than code produced by -march=X -mtune=X
does (or just -march=X
, as -mtune=X
is implied). No idea if this is true or not.
Generally, unless you know exactly what you need, it seems that the best course is to specify -march=<oldest CPU you want to run on>
and -mtune=generic
(-mtune=generic
is here to counter the implicit -mtune=<oldest CPU you want to run on>
, because you probably don't want to optimize for the oldest CPU). Or just -march=native
, if you ever going to run only on the same machine you build on.