Why is it so difficult to upgrade between major versions of Red Hat and CentOS?

(Author's Note: This answer refers to RHEL 6 and prior versions. RHEL 7 now has a fully supported upgrade path from RHEL 6, the details of which are at the end.)

To start, I should note that there are two ways to do the in-place upgrade:

Drop in the installation DVD (or use the DVD image via iLO/iDRAC), boot from it and choose Upgrade, e.g. linux upgradeany.
Update the redhat-release RPM manually, run yum distro-sync (this is oversimplified a bit) and reboot.

Method 1 is merely unsupported. Method 2 is for Real Cowboys. In addition to the recommended fresh installs, I have done both of these...

Do I need support?

Support has two complementary meanings in our world. The first is that a product has a given feature (e.g. "Postfix supports SMTP"). The second is that the vendor will talk to you about it. Which definition is meant is not always clear from context.

To accomplish a task, you obviously need support in the first sense. Where vendor support comes in is to assist you in resolving issues and giving the vendor feedback as to what features need to exist or be improved. Many sites pay a fortune for vendor support when they have the in-house expertise to resolve any issues that may arise, faster and even cheaper than the vendor could. Whether to buy vendor support is ultimately a business decision you will have to make (or advise management on).

Why not do an in-place upgrade?

This is what Red Hat says about it:

Red Hat does not support in-place upgrades between any major versions of Red Hat Enterprise Linux. A major version is denoted by a whole number version change. For example, Red Hat Enterprise Linux 5 and Red Hat Enterprise Linux 6 are both major versions of Red Hat Enterprise Linux.

In-place upgrades across major releases do not preserve all system settings, services or custom configurations. Consequently, Red Hat strongly recommends fresh installations when upgrading from one major version to another.

They further warn:

However, note the following limitations before you choose to upgrade your system:

Individual package configuration files may or may not work after performing an upgrade due to changes in various configuration file formats or layouts.

If you have one of Red Hat's layered products (such as the Cluster Suite) installed, it may need to be manually upgraded after the Red Hat Enterprise Linux upgrade has been completed.

Third party or ISV applications may not work correctly following the upgrade.

Of course, they then describe how to do an in-place upgrade via method 1, just in case you really want to do it. The feature exists and Red Hat puts development time into it, so it is supported in that the feature exists. But if something goes wrong, Red Hat will tell you to install fresh; they will not provide vendor support for things that break as a result of the upgrade.

For the record, I've never actually had a problem with an in-place upgrade of a RHEL/CentOS or Fedora system that I couldn't resolve myself. The typical problems come from renamed packages, third party repositories and the occasional version mismatch between the i386 and x86_64 architectures of a package. The installer is a bit better at handling these than yum, I think.

How should I upgrade?

I generally warn people that they should plan on a maintenance window every 3-4 years to update RHEL systems from one major version to the next. While upgrades generally go smoothly, the unexpected can always happen.

For both of your environments, I expect an in-place upgrade would work, though I strongly recommend testing it thoroughly first. P2V a representative sample of the servers and run through the in-place upgrade on the virtual systems to see what problems you're going to run into. You can then plan the actual production upgrade based on better knowledge of what will happen.

For a large deployment such as you have here, consider using Limoncelli's "one-some-many" approach. Upgrade one machine, see what problems occur, solve them, then use lessons learned when upgrading a small batch of machines, repeat the lessons learned thing, then when you believe you have all the kinks worked out, upgrade large batches of them.

At a time like this, I also recommend taking a long hard look at your application deployment process. If it isn't sufficiently automated that you can kick it off with a single command and be reasonably sure that the app will be deployed correctly, then perhaps the developers need to get to work on that. Having such a deployment process would make it much easier to do a fresh installation of the newer version of EL and then deploy onto it.

Will switching distributions help?

Debian-based distributions do have a supported in-place upgrade method, and it mostly works, but it is not immune from problems. Lots of things broke for people upgrading from Ubuntu 10.04 LTS to 12.04 LTS via the supported method, for instance. It's not clear that Debian or Canonical are putting a sufficient amount of development time into "supporting" this feature, i.e., making sure it works. And you still actually have to buy vendor support for this distribution if you want someone to hold your hand. So I doubt you will gain much from switching to such a distribution.

You may gain by switching to a rolling-release distribution such as Gentoo or Arch. However, this also doesn't make you immune to problems; it just means you have to deal with the upgrade problems continuously over the life of the server (e.g. whenever you or the developers decide to update something on the system), rather than all at once at a well-planned distribution upgrade time. You also have no vendor to provide support.

What does the future hold?

The Fedora Project is working on a tool to improve in-place upgrades. They had a tool called preupgrade which was abandoned and replaced with a new tool called fedup beginning with Fedora 18. This was added to RHEL7 and now in-place upgrades have full support, at least from RHEL 6 to RHEL 7. From my own experience I can say that while fedup still has some kinks, it is shaping up to be a very useful tool.

CentOS is also experimenting with a rolling-release type of repository, but it only applies between minor versions (e.g. 6.3-6.4).

My take on your last paragraph:

I suppose there's the configuration management angle, but most Puppet installations I see do not translate well into environments with highly-customized application servers (Environment B could have a single server whose ifconfig output looks like this). I'd be interesting in hearing suggestions on how configuration management can be used to help organizations get across the RHEL major version bump, though.

I think the real value of configuration management systems, especially in the context of Environment B, is that they provide the tools to construct a service independently of the servers which run it. If a CMS wasn't used to create the existing services, then it probably won't help very much in recreating the services.

I know this doesn't solve your immediate problem, but to me it stems from the organisation thinking in terms of servers rather than services. In service-focused thinking, the personality of individual servers need not be maintained as long as the service continues to function. If a CMS is used in a disciplined manner to build the entire service, then moving that service to another system should be relatively straightforward, because all of the machine's personality will be built by the CMS.

P.S. I'm not exactly sure what's significant about the ifconfig output in this context - it's produced by a configuration file and some scripts (otherwise it wouldn't be there on boot), and those can be managed by a CMS, if needed.