Managing an application across multiple servers, or PXE vs cfEngine/Chef/Puppet

We have an application that is running on a few (5 or so and will grow) boxes. The hardware is identical in all the machines, and ideally the software would be as well. I have been managing them by hand up until now, and don't want to anymore (static ip addresses, disabling all necessary services, installing required packages...) . Can anyone balance the pros and cons of the following options, or suggest something more intelligent?

1: Individually install centos on all the boxes and manage the configs with chef/cfengine/puppet. This would be good, as I have wanted an excuse to learn to use one of applications, but I don't know if this is actually the best solution.

2: Make one box perfect and image it. Serve the image over PXE and whenever I want to make modifications, I can just reboot the boxes from a new image. How do cluster guys normally handle things like having mac addresses in the /etc/sysconfig/network-scripts/ifcfg* files? We use infiniband as well, and it also refuses to start if the hwaddr is wrong. Can these be correctly generated at boot?

I'm leaning towards the PXE solution, but I think monitoring with munin or nagios will be a little more complicated with this. Anyone have experience with this type of problem?

All the servers have SSDs in them and are fast and powerful.

Thanks, matt.


Your cluster sounds more like an HPC cluster than an OLTP one like mine, but I think the setup I'm using would work for you too. I call it the "mpdehaan trifecta" because thats the ircnick of the guy who wrote or manages the three tools involved.

1.) Cobbler for base-build provisioning. Cobbler is a project that aims to be the intersection of your kickstart, pxe, yum-repo, dhcp, dns, etc systems. Its by far the easiest way to get a kickstart setup up and running, and you can grow into the other features as needed.

2.) Puppet for configuration management. Ideally your cobbler built hosts are very barebones configs that know just enough to phone home to your puppet server on startup. Puppet will then apply your configuration settings and keep them consistent across your environment in perpetuity.

3.) Func for ad-hoc commands to multiple machines in parallel. For instance "deploy a new svn checkout of the code and restart apache". Its pretty easy to just use func to call the same bash command on a group of servers much like cluster-ssh. If you really want to get into it you can write your own modules for it with some really simple python.

All three of these tools have good wiki's and active irc channels for help on freenode.


Overview

In some ways, you have two questions here..

  • How do I build and maintain standard servers?
  • How do I maintain standard configuration and make changes later?

I've split my answer below addressing those two things separately but they are very closely related. I am addressing the technology solutions here and not any of the best practices that are related, such as change control.

If this does not cover the scope of your question, please clarify and I will be happy to elaborate. This is necessary foundation, which is critical for a well-run technology infrastructure.

Building Servers

I don't like images in the UNIX world; that is more of a Windows style approach. Even some Windows people seem to be refocusing on scripts for standard builds now.

Satellite seems to be getting somewhat popular in the RHEL world. Spacewalk is the Open Source counterpart. You definitely have to buy into the RHEL approach entirely to use this. This serves as both server building and configuration management.

Ideally, you would want to establish local mirrors and repositories on a fileserver for all necessary software.

First, take advantage of your distribution build automation, such as Kickstart in RHEL/CentOS. The Kickstart would be a baseline with variations, depending on your needs. The Kickstart builds can be initiated from a PXE server.

For the more advanced part of the build and anything that was not suitable for a Kickstart file, you could write your own custom scripts. However, you may find puppet or cfengine works well for you instead of custom scripts. I have found custom scripts to be the most flexible and are not limited to any single approach.

If you choose to write your own scripts, I recommend a core script for universal configuration. This would be security configuration, hardening, and anything that applies to all builds. Then a final script to finalize the server role. For example, a web server or a database server.



Maintaining Standards

What you describe also falls under maintaining configurations. Build standards, software updates, and other things are related to builds but in a lot of ways separate.

If you choose to rely on system packages as opposed to creating your own source based builds for your most important server roles, a lot of that can be maintained with native system utilities. This can be as simple a script to run a for loop against your server list and run a yum -y update package.

For configuration management, this is where puppet, cfengine, and other configuration management utilities come into play. These are very useful utilities and provide the necessary foundation without writing your own scripts from scratch.

When you update your configuration standards for your servers, it is important to backfill this into your standard server builds.