Is there any any merit to routinely restore a linux system, even if unnecessary?

I do fieldwork with a number of computers running ubuntu performing critical tasks doing fieldwork. The computers are similarly configured with slight variations.

Since we've had some configuration issues in the past, my boss is pressing for us to take an image of the installation on each computer, and restore each computer to that image before they are to go into the field.

My preferred solution would be to write a common script that checks to ensure that the configuration of the system is correct and that the system is operational. If the computer has been verified, isn't restoring it to that configuration redundant? And are there any inherent problems with doing so?

My reluctance stems from the fact that our software and configuration is subject to change in the field, but these changes must be made across all the computers. That means that when a change is made, all the restoration images have to be updated as well. The differences in the configuration of each of the computers live in /etc. In the event that restoration is required, I would prefer to keep a single image containing everything that is common to all machines, and have a snapshot of each computer's /etc directory to be used for restoring the state of that particular machine.

What's the better approach?


Solution 1:

It sounds like what you really need is a configuration management system like puppet. With a proper setup, it should verify that everything is the way it is supposed to be setup on every puppet run.

Periodically re-installing systems just to get them to a known state should not really be necessary, and sure seems like it will create a lot of work, that you shouldn't ever need to do. When you have a lot of Linux systems it is much better to have a really basic automated install, and then have a system to pull in all the proper configurations for a given node after defining a unique system name/id.

Solution 2:

If you could virtualize these servers, then you can periodically return to a base snapshot which was taken after everything is verified. This has the advantage of being guaranteed known state, because configuration management works but does not guarantee returning to a known state of the entire server.