As an admin, what tools do you use to log what you do to your boxes? [closed]

I am more of a linux applications developer than an admin. Over time, I've built servers and maintained them, sometimes to offer services, mostly just to develop the applications I work on.

Way back when I would create a file in my account to keep notes on what I did on each machine, so that I could replicate that when I migrated to other machines.

Nowadays, I install something a private trac installation, install it's blog plugin, and then use that to make notes of everything I install, and most commands that I run, as well as the output. This provides me a combination wiki and blog that I find very useful as a "captain's log". I do this mostly so that when I migrate to a new clean machine, I have a much easier time in bringing it up.

And yet, I am always amazed when I see others just install this, delete that, run this, setup this config, ... without seeming to use any way to actually note what they are doing.

What do YOU do, and what tools are available?

I am especially interested in the transition between maintaining a few machines for a few people and maintaining several to dozens of machines providing a real service.

What are the best practices, and where can I find good resources?

Thanks!


The answer to this is definitely formal configuration management. The three big contenders in that space these days are Chef, Puppet, and Cfengine. Basically you need to apply the development approach to system administration: Write up a 'program' that defines the machine state, and apply that (obviously that's a gross simplification).

The truth is that a lot of us crusty neckbeard type linux admins 'magically' configure machines because we've completely internalized the knowledge over many years of repetition. I know approximately the list of config files I need to touch on any given machine, and I can hand-edit those from memory. That's actually a terrible way to do things, particularly if you have more than one server or more than one sysadmin. Hand-editing a config file is always a mark of bad planning and bad management.

I'm a big fan of using Clonezilla plus PXE boot to automatically bring up a system initially, and put enough details into a Kickstart config so that the machine is on the network and running the CM tool of your choice. Everything beyond the most basic 'bring the system up, put it on the network' logic should go in your CM tool, not in your initial system image or Kickstart.

As a point of reference, I currently administer about 10,000 unix servers.


You're question is approaching this from the few machine perspective. It's not about recording your commands. For this the script command should get what you're asking. But the real solution to scaling is being able to rapidly reproduce configurations and manage changes within those configurations. Tools like Puppet and Chef will let you do that. The recipes are normally kept in a revision control system so that you have full visibility into every change made to the configuration and can role back to previous configurations if required.

When these tools are combined with PXE boot and preseed or kickstart you can rapidly provision machines for various roles.

It's not a bad idea to use these sorts of tools even when you manage just a few machines for the change management and tracking that it provides.