Are there any well known anti-patterns in the field of system administration? [closed]
Solution 1:
Leaving automatable tasks to be automated until doing them manually takes up enough of the time that they cannot be automated, because doing the tasks manually eats all the time.
Conversely, premature automatisation. There's absolutely no need spending 3N hours automating a one-shot task that takes N hours to do manually (even if it's more fun automating than slogging through things by hand).
Solution 2:
A. not testing restore - a backup can be verfied and ok, but how to restore?
How long it takes, what it takes? You have to know to do that in a stressed situation...
B. no configuration management, no uniformity - just a change here and there and I think I've tuned some here...
Who knows how to replicate a well done server if all quirks are not written down and there are no identical configurations in the shop? What if you succeed to restore data, but not configuration, apps?
C. no monitoring - having no idea how and what boxes are doing
This is twofold: a) you have to monitor for alarms to react in time before you run out of some resource or strange behavior and b) you have to monitor long-term trend to manage capacity (disk, CPU, RAM, network, ...).
D. no redundancy in your cfg - what happens when XX dies
This means planning ahead what you want of your sysadmin.
For me these are most important.
Solution 3:
The most killing pattern is when the system administration department (or the whole IT) becomes a passive participant in the company. That is, they are viewed as a self-service where everyone comes with already formed ideas how things should be done, which takes exclusively user needs into consideration and not the needs of the complete IT ecosystem as a whole.
The second most killng pattern is when the system administration department turns into a bunch of button pushers, i.e. all software/tools are bought or developed and installed by third party and system administration get an official training and a manual and then only follow operating manuals and escalate to the vendor everything that is not explicitly in the manual. This situation may be very comfortable for (some if not most) system administrators but this is a disaster waiting to happen when the fact that no one really knows how the whole system actually works will bring it to the ground (think subtle interactions between components and the blame game between vendors).
Solution 4:
1) over-promising and under-delivering (i.e. keeping user expectations realistic)
2) Not verifying backups until they are needed.
edit: I intended number 2 to include the restoration of files / data