Test-driven development for infrastructure deployments?

I've been using puppet for deployment of infrastructure, and most of the work I do is with Web 2.0 companies who are heavily into test-driven development for their web application. Does anyone here use a test-driven approach to developing their server configurations? What tools do you use to do this? How deep does your testing go?


Solution 1:

I don't think you could use test-driven development. But you could certainly try unit-testing on new servers.

Basically you would need to deploy servers, start up the services in a test mode, and then run tests from another server (or series of servers) against the services. Then finally put them into production.

Maybe using python scripts to connect to databases, webpages, and ssh services. And then return a PASS/FAIL. Would be a good start for you.

Or you could just roll this up into a monitoring solution, like Zenoss, Nagios, or Munin. Then you can test, during deployment; And monitor during production.

Solution 2:

I think Joseph Kern is on the right track with the monitoring tools. The typical TDD cycle is: write a new test that fails, then update the system so that all existing tests pass. This would be easy to adapt to Nagios: add the failing check, configure the server, re-run all checks. Come to think of it, I've done exactly this sometimes.

If you want to get really hard-core, you would make sure to write scripts to check every relevant aspect of the server configurations. A monitoring system like Nagios might not be relevant for some of them (e.g., you might not "monitor" your OS version), but there's no reason you couldn't mix-and-match as appropriate.

Solution 3:

While I haven't been able to do TDD with Puppet manifests yet, we do have a pretty good cycle to prevent changes from going into production without testing. We have two puppetmasters set up, one is our production puppetmaster and the other is our development puppetmaster. We use Puppet's "environments" to set up the following:

  • development environments (one for each person working on Puppet manifests)
  • testing environment
  • production environment

Our application developers do their work on virtual machines which get their Puppet configurations from the development Puppetmaster's "testing" environment. When we are developing Puppet manifests, we usually set up a VM to serve as a test client during the development process and point it at our personal development environment. Once we are happy with our manifests, we push them to the testing environment where the application developers will get the changes on their VMs - they usually complain loudly when something breaks :-)

On a representative subset of our production machines, there is a second puppetd running in noop mode and pointed at the testing environment. We use this to catch potential problems with the manifests before they get pushed to production.

Once the changes have passed, i.e. they don't break the application developer's machines and they don't produce undesirable output in the logs of the production machines' "noop" puppetd process, we push the new manifests into production. We have a rollback mechanism in place so we can revert to an earlier version.