Performance difference between compiled and binary linux distributions/packages
I was searching a lot on the internet and couldn't find an exact answer.
There are distros like Gentoo (or FreeBSD) which does not come with binaries but only with source code for packages (ports).
The majority of distros uses binary backages (debian, etc.).
First question: How much speed increase can I expect from compiled package? How much speed increase can I get from real world packages like apache or mysql? i.e. queries per second?
Second question: Does binary package means it does not use any CPU instructions that was introduced after first AMD 64bit CPU? With the 32bit packages does it mean that the package will run on 386 and basically does not use most of the modern CPU instructions?
Additional info:
- I am not talking about desktop, but server environment.
- I dont care about compile time
- I have more servers, so speed increase more than 15% is worth for using source code packages
- Please no flamewars.
Solution 1:
The performance difference will be in almost all cases minimal and not worthwhile. Good reasons to use source distributions (while rolling your own binary packages, as gentoo's bindist system allows) include:
- Deploying your own custom patches
- Customizing your kernel easily
- Packaging your own updates
If you're not doing any of these things, you don't need a source distribution. For personal use they're very convenient because they allow you to upgrade things incrementally at will without worrying too much about binary compatibility, which is not a concern I see often in an enterprise setting.
It's worth noting that you can do these things with a binary distribution as well, by making your own RPM packages or whatever. The management overhead is similar.
You will basically not see a 15% speed increase by compiling from source. I'd be loathe to estimate it at even as high as 5% in any reasonable case. Compiling from source gets you a couple things:
- You get to use your preferred compiler version
- You can direct the compiler to generate instructions from ISA extensions not used in binary distributions' packages, such as AESNI and AVX
However, the compiler very rarely actually generates these anyway, and the overall savings from using them are generally very miniscule when the application's performance is taken as a whole. Things like RAM accesses (and latency) and disk and device latency are much bigger factors, and you should really start there.
Applications which might benefit from a custom compilation that will only run on a relatively recent Intel core i7 or i5 include ones that do a lot of vector math and ones which do a lot of AES encryption and decryption, or require a lot of random numbers. If you want to use the Intel DRBG you would need to do this as well, currently.
If none of these apply to you, you'll be quite happy with any of the debian or red hat based distributions out there, and will have a lot less maintenance overhead.
Solution 2:
Short answer... Many large-scale and speed/latency sensitive applications run on standard Linux distributions. Red Hat, CentOS, Debian, Ubuntu... They all work well in the majority of cases. Most gains come from application tuning, standard kernel and OS optimizations and infrastructure.
Gentoo may offer some optimizations, but open the door to more management woes, reduced mindshare, diminished vendor and driver support, stability issues, ridicule and potential security concerns.
I've managed Gentoo-based servers in a high-frequency financial trading environment. Even though there were some slight performance benefits under Gentoo, I still moved to Red Hat and CentOS. Gentoo's advantages on paper were easily overcome by smarter hardware selection, better server manufacturer/hardware integration support, smarter patching by Red Hat engineers and more esoteric technologies like kernel bypass...
If you are at a point where the efficiency of popular application stacks (LAMP) is an issue, please be sure to have optimized your server hardware (CPU type, RAM layout), networking infrastructure, monitoring system and be able to identify system bottlenecks before going down this path.
Are you hitting a performance limitation now?
Solution 3:
All the points made are of course correct. I would just like to take some issue with the idea that 5%-15% performance increase is unachievable, especially with modern versions of GCC, it really depends upon the CPU architecture and how close it is to the base-line used as the target for the binary distributions. GCCs -march=native will, in addition to using the ISA extensions also optimize for L1 and L2 cache/line sizes. Correctly aligned code (for your CPU) can be much faster especially when -flto is also used so the compiler can know everything it needs to take into account. [some packages are currently broken with LTO, unfortunately]
Additionally, compiling select packages with -Ofast, in addition to march=native and LTO can make a significant difference.
In the future, if GCCs Graphite infrastructure ever stabilises that will have the potential for even greater gains.
Solution 4:
It depends on what you want in your system, and really there's three schools of thought here (and this is true for both hardware and software)
Firstly, the mainstream as far as most folks on SF go - you want something you know will work, you want support and you want it now. In this case, going with redhat based systems (RHEL gives you excellent support, and centos is a community rebuild of the well tested RHEL distribution). You however will not get the latest and greatest. In many cases this is also true of hardware.
The second is the 'middle of the road' point of view, which is the middle ground - going with something like ubuntu. You want new packages (at the slight expense of utter rock solid stability), you want an installer, and nice things.
In some cases people do run into trouble, but you have newer packages and things are reasonably tested. While there's a lot of hatred for Ubuntu here, its a good compromise between ease of setup and reasonably new packages. Debian probably is a slightly more conservative choice. These days, you can even set up Ubuntu with a low latency kernel out of the box. I tend to feel ubuntu and debian work for me, but ymmv. A lot of places that deploy a lot of servers like facebook and google go for this option.
Finally there's source based distributions. Initial setup in most cases is an utter pain in the rear. You make a mistake with setting up your kernel? Oops, spend a few hours recompiling. You don't get an installer either - thats for n00bs. You often get bleeding edge applications, and the option to compile them as you need them (which includes being able to pick optimisations for speed or memory use for example), and a rolling release. If you have very specific, esoteric needs, gentoo's great. If you need to roll out a few dozen systems and want to automate it... good luck. Source based distributions simply don't scale as well. You're getting a lot of flexibility, *some** extra speed, but not maintainability at the same level as a package based distribution IMO. You're not likely to get 15% extra speed, and you'll likely end up wasting time trying to tune the compilation flag for your hardware, and if you mess something up, spending time working out what exactly failed.
The BSDs are a separate family of OSes. Some folk swear by them (at least one comms room regular is a freebsd user), and different BSDs have different focuses - for example openbsd is security obsessed, and freebsd is the 'mainstream' one. They may not, in some cases have the same kind of hardware support linux does, but that depends on quite a few factors.