How to record server changes?

So we've all probably had this situation: you debug some problem, only to realize it was caused by a config change you made six months ago, and you can't remember why you did it. So you undo it and fix the problem, and now some other problem comes back. Oh yeah, NOW I remember! Then you fix it properly.

It's because you didn't take proper notes, you fool! But what's a good way to do this?

In engineering we have loads of software meant to help us detect and track changes. Source control, code reviews, and so on. Every change is tracked, every change requires a comment as to what it is. And typical engineering departments require good comments so that in six months when you're figuring out why you broke it like that, you can use a historical 'blame' feature or binary search builds to pinpoint the problem. These tools are very effective communication tools and historical records.

But in serverland, we have 500 different services, all with different ways of configuring them. And they don't always have a text format (consider setting permissions on a folder or altering the pagefile location) though they may have a textual representation.

In our environment, we check in what config files that we can into Perforce, but there are very few of those. Can't exactly check in the Active Directory DB..though perhaps a dump that could be diff'd...

In the past I have tried keeping a manual change log in our wiki, but it's super hard to maintain the discipline to do this (I know, not a good excuse, but it really is tough).

MY QUESTION: What strategies and tools do you use to cope with this problem of tracking configuration changes to your servers?

-- Update --

Note: I'm not looking for shared-note taking tools (I'm familiar with OneNote, etc) so much as automated tools specifically meant to help with tracking server changes. There's no comprehensive tool for tracking server config changes, but perhaps there are some for specific applications like GPO's.

Also I am very interested in specific strategies that you've found useful. "We share notes in Sharepoint" is pretty vague. How do you maintain the discipline? What format do you use to track your changes? How do you organize your change data? I'd really like examples as well as ideas.


Solution 1:

In Linux land, people are pursuing a couple of different strategies:

  • Configuration constraint systems, like cfengine or puppet or chef. These are similar to windows GPOs. Point being that all the server configuration is intentionally documented in a single place and you know at what granularity (server room, group, specific server) the policy is enacted. This won't quite save you from "what was the hell was different six months ago?" but it does let you just nuke a server config and rebuild from scratch. You might put the cfengine and puppet policies under revision control to answer the question.
  • Revision controlling /etc. Generally, Linux programs store their configuration in one place, /etc. The daring are beginning to write scripts to put /etc into revision control. One such program I know of is etckeeper:
Description: store /etc in git, mercurial, bzr or darcs
 The etckeeper program is a tool to let /etc be stored in a git, mercurial,
 bzr or darcs repository. It hooks into APT to automatically commit changes
 made to /etc during package upgrades. It tracks file metadata that version
 control systems do not normally support, but that is important for /etc, such
 as the permissions of /etc/shadow. It's quite modular and configurable, while
 also being simple to use if you understand the basics of working with version
 control.

Solution 2:

One of the problems in this situation is that, really, it's a combination business process/technological problem. And it is definitely bigger than just tracking what changes an admin made. You also need to keep an eye out for unexpected changes, and good coordination between admins or units so that a change on an AD controller doesn't break a database permissions setting on some departmental server. I.e., your question is a giant can of worms :)

In my organization, we are about a year into rolling out processes and systems to address this. For the business process side we formed a Change Management team. According to SOP all changes to production environments are coordinated through them. They compile all the changes, along with scope, systems affected, services affected, etc. Enforce good documentation on the changes, as well as both roll-out and roll-back plans. Host weekly (open) meetings to go over upcoming environment changes, then send emails out detailing all of these changes. The end goal with this process is so that, effectively, everybody in IT knows everything else that is going on. This helps stop the problem of, for example, a SysAdmin installing a kernel patch and rebooting a system that will take down the timeclock database.

As for the technological side I can only speak of the Unix/Linux guys since I don't deal with Windows. They have been rolling out Puppet, by Reductive Labs, for configuration management of all of those systems. Simply, is a client/server system where one defines a machine configuration on the server, and the client pulls those chances every so often (30 minutes by default). Additionally, if any chances are made to managed files locally then they are reverted back at that time as well. We use it for managing running services, firewall configurations, user authorization, etc.

I would also recommend looking into something like TippingPoint. It is a client service that watches system configuration, and sends alerts on changes. It makes us security folks most happy. It is largely used for tracking malicious or unpublished changes.

Solution 3:

I have been at 4 or 5 companies now I don't really remember.

We all had this problem. None of us have solved it 100 percent, but at the company I am at now we have what I think is the best strategy to date.

Sharepoint/Wiki/Evernote/PINs

  • Sharepoint
    • moan all you want...it has some very nice list features.
    • IP address lists
    • inventory
    • service accounts and use
    • change notification logs
  • Wiki
    • How-to's
    • long range task lists
  • Evernote
    • my partner and I use this to put everything we don't want in Wiki
    • more how-to's that are technical in nature
    • scratch notes we both need to see
    • task accounting for the week
    • contractor task lists
    • evernote clipper makes it easy to screen shot AD/rights settings
    • available everywhere
  • PINs
    • Password repository

Solution 4:

There's probably better tools for some of these, but this is what we use:

  • Track configuration changes and upgrades/patches on a per-server basis in a private wiki
  • Also keep howtos and a record of problems/solutions in the wiki
  • Use Sharepoint or Google Docs to keep authoritive copies of things like static IP lists
  • use Subversion to track changes to configuration files