Release Management for Infrastructure [closed]

Does anyone use release management principals for system administration of infrastructure in the same way it is done for software development?

I have been in the system administration field for more than 10 years and I have yet to be exposed to a company that uses release management principals for managing server infrastructure and application configuration in the ways that it is done for software development. Things like externalizing configurations, checking configurations in and out of a versioned repository, automated deployment of configs to systems, promoting through proper non prod environments, automated unit testing of components, etc.

I'm curious as to the applications and processes anyone uses to manage these configurations and deployments. Also if creating release notes for a config deployment is something that anyone does?

Additional Comment- I agree that blindly subscribing to a methodological framework doesn't make you a better organization, and that's not what I'm asking. I am trying to ascertain if there are certain concepts that can apply to system administration in the same way they apply to software development. For example, if I want to make a configuration change to a system in prod, how do I know that what I tested in dev was what really got moved to prod? I would say if you had a system where that config was checked into a repository, versioned and then deployed to a system in prod automatically, that would go a long way toward ensuring that things worked correctly once they are deployed to production.


Solution 1:

I actually spend quite a bit of time thinking about this issue. At my large internet company, my job is internal release management of the software that runs on our many servers. We've actually done quite a lot of work to try and apply release management principles to infrastructure or system administration. While our software packaging system is available to the outside world, the general principles should be the same.

Here's an example: it used to be that when web servers were set up, an admin had to remember to set the vip address as an alias on the loopback address, to bring the machine into rotation. We continually battled with machines being swapped out and this important step being missed. The result would be a server sitting there ready to go, but unable to serve traffic because the vip had it marked as down.

The solution we used was a software package that we integrated into our general releases. We have a templating system that generates server farm-specific settings for each of about 600 farms. Those settings are then applied by the packaging system when the matching software package is installed.

So, this relatively simple package that we created simply took the per-farm setting and set it on the system loopback. That completely eliminated the issue of systems being accidentally marked as down by the vip.

We've applied this methodology to other parts of the system as well. The results has been that we have gradually moved much of the system configuration into our software release system. We build and distribute software releases which contain all the necessary software packages. Those packages in turn pick up the per-farm settings and apply them to fix up things like the loopback address.

This remains a fairly high-level mechanism. There are other systems which ensure that the base OS is loaded on a server and that sysadmin user accounts are installed. However, once you get beyond that level, we try very hard to move all possible system configuration into settings which are then read by packages. We've been very happy with this approach for managing approximately 10,000 servers.

Solution 2:

This is kind of a loaded question for a number of reasons.

First, there's no one way of developing software. On the one hand, you have traditional, waterfall-like models where requirements are gathered up-front and software follows a very rigid, unchanging lifecycle through to completion of a major release. On the other hand, you have Agile models where there might be a new release every week or two. In my experience, the former tends to be reflected in the enterprise software model (ERP systems and the like) where the latter tends to be reflected in smaller, less complex systems (LAMP stacks and so forth).

Second, just because you can subscribe to a methodological framework doesn't mean that you should -- look at enterprise disasters like ITIL and COBIT (at least when companies rush naively into whole-hog implementation without considering what they're actually doing and why). The right way to approach an IT problem is to figure out what your return-on-investment is actually going to be for any potential process improvement, and then determine whether to implement it. If you're blind to the requirements of your business and the workflows of the people helping to support its technology, you're going to accomplish nothing besides wasting a ton of time and money on something because you heard on some guy's blog that it was a "best practice" for someone in their particular situation at some point in time. If you're administering servers for a company that sells a service that runs on a large web farm of identically-configured servers, there are going to be much bigger repeatability benefits to configuration-as-code than a shop with 100 heterogeneous departmental servers and reliable backups of a working system state.

Sure, though, there are plenty of shops out there that subscribe to this mentality at least in some form. It's the entire reason that projects like Puppet, Chef, and Cfengine exist. As to whether they do everything you're inquiring about, it's a matter of degree -- as it should be.

Solution 3:

We use Puppet to manage all of our configs. On top of Puppet's historical data we also check our configs into SVN.