dead man's switch for remote networking interventions on Linux

As I'm going to change the network configuration of a remote server, I was thinking of some security mechanisms to protect me from accidentally losing control on the server.

The level-0 protection I'm using is a scheduled system reboot:

# at now+x minutes
> reboot
> ctrl+D

where x is the delay before reboot.

While this works relatevly well for very simple tasks like playing with iptables this method has at least two drawbacks:

  • It's not very reactive, ie a connectivity problem should be detected automatically if for example an automatic remote ssh command fails does not work anymore for x seconds.
  • It can obviously not work if one need to modify some configuration files and then reboot to test the changes.

Are you guys using some tool for the second point ? I would love to have something able to revert the system configuration in a previously known stable state if I can't join the server X minutes after reboot.

Thanks!

Edit:

  • The server is a remote Linux server, with either a Debian-like or a RHEL-like distribution.

  • I only have acces to this specific server, behind a firewall. All ports are filtered, except port 22 (ssh). So no KVM switch, no iDRAC, etc.

  • I can have local support on this machine in case of a critical failure but this requires far too much time: it takes three hours to get there by car. And I perfer spending this time on serverfault or developing my own tools to avoid going there.

  • my actual plan: develop some ugly tool based on mercurial or git and calling a "hg revert; reboot" in a cron. I just wondered is some well tested tools already existed.


Short of an alternative method of connecting, such as that suggested by ewwhite, I think your method is fine. It is simple, and you can give yourself the amount of time you feel necessary.

Note - I don't think you should need to reboot a server to verify your changes - restart appropriate services instead if absolutely necessary. A reboot isn't necessary to "lock in" changes - it is just one option that might achieve this.

I would add that you probably shouldn't be experimenting with changes directly on a production system. Use your scheduled reboot as a precaution, but only when applying changes you are certain will work. Cancel the scheduled reboot when your changes worked.


This is a case for out-of-band management in the form of an ILO or DRAC card or remote IP KVM?. Is that an option in your scenario?


There's always homemade out-of-band management. Get a second system and connect it to the server via a serial cable. Run a getty on ttyS0 or whichever serial port; this lets you log in via the serial port. If you make the second system accessible via the Internet, you then have another path into to server if you shut yourself out of it.


When out of band management isn't available, I roll my own script which is highly dependent on the server and what I've adjusted.

The most common case is changing a remote router's firewall. I launch a screen session and then run:

./iptables.sh ;echo Rules applied;echo sleeping until flush...;sleep 5 && echo Sleeping 20 more seconds - rules worked if you\'re reading this press ctrl-c to cancel the flush && sleep 20 && ./iptables-flush.sh || echo Flush cancelled

So iptables.sh has my new rules in, while iptables-flush.sh has a basic set of rules, which will allow me to reconnect remotely if I screw up. I hit ctrl-c to cancel the flush, which I can only do if the rules didn't disconnect me.

So you'd just need a more detailed script. For example, if you're testing changes to your network interfaces you'd write a script and put it in rc.local. It would try to ping a few different hosts and if any of them fails it should copy the old network interface file and reboot.

Or perhaps the script could check the ssh logs - if it doesn't see you log in with 90 seconds, restore the config files and reboot.

So the short answer is, increase your bash-fu :-)

And figure out a way to get the out of band management working. That really is the proper answer, which I'd always want as a fall back. For example, since you have ssh access (hopefully to more than the machine you're working on), can you use ssh port forwarding to get around the firewall?