Sysadmin performance metrics?

If it's a unique problem, how can you measure whether the problem lay in the person or the problem?

You should be documenting everything that would be required to get your department running if half your people are killed/fired/etc...if you needed to rebuild the department with new admins, they should be able to get things running again at a new location with your documentation.

In practice...hee! Yeah, right. You're lucky if the docs are kept up to date if they're even created in most places.

If you're managing the monster tasks perhaps you need to just meet up with your admins and ask how things are going and what's been tried. If in this three weeks he's been tasked with just this problem and it's not getting solved, is it because he's not working on it? What has he tried to rectify the issue?

You can't micromanage the issue or he'll probably start fighting you on it. The sysadmins need enough freedom to work without feeling like he's being scrutinized every step. But if the project or task is really far behind, then you have a legitimate concern. Find out from him if there's something he needs in order to get the job done, or what the problem is that he is having difficulty overcoming.

Good book: Managing Humans by Michael Lopp.

Performance should be based on how well IT issues are addressed to meet the needs of users, along with maintenance of the servers and infrastructure issues. You can't possibly reduce the issue down to "solving X issues a day" or "writing X lines of code" to measure each employee.

Maybe you can get input from others on the team to get some feedback on how each other is doing or what major needs are. Good techies want to work with good techies. They don't want to work with people that are "happy and nice" but incompetent. They'll work with a grumpy curmudgeon who hates being in the room with them if it means that everything works well and the curmudgeon knows his stuff.


Old Stuff (Legacy) Can be Hard:
If I read correctly, the you have old builds of software and are trying to get it running on recent OS buildings. Red hat 8 is 7 years old now, so I would say the application should be updated too (Maybe these modules haven't been updated since then). So it sounds like a difficult mess as you say.

Documenting and Expectations:
It depends, but you really should lay out what you expect in general. Make everything you want very clear. Then you should be able to trust the admin to follow through with that and update you if they can't for some reason. You can check in with them, and make sure they are doing this stuff. System administration is odd in that it varies greatly from position to position, so it might take some time to get them to understand what you expect from them.

My Recommendation, Communicate!:
I think we can't tell you if these are hard problems are not. Developers should not be that far off from system administrators, so if you are having issues, get a developer you trust to sit down with the Admin and help him solve these problems. That developer should be able to give some feed back.

Regarding updating Everything:
Some thoughts that may or may not be useful:

  • How heavily is this used? Maybe it would be just better to virtualize it and forget about it:-P
  • How complicated is the application? Might it be cheaper and take less time just rebuild it? This goes back to updating the application too, maybe if these modules are outdated, those parts should be taken out and recoded. It also goes back to communication, team system administrators and developers together to come to the best solution if you can.

I'd say that if your sysadmin can't get a custom OS installation completed after 3 weeks, either he/she is incompetent or else you're somehow confusing him/her, thus resulting in endless delays. In the scenario that you described, a basic/foundation workflow should be: management and/or deployment team comes up with a list of requirements and dependencies. The requirements would include timeframe, scalability, fault tolerance, robustness, availability thresholds, etc. Dependencies would cover what applications need to run on the server and, optionally, what software is required to support those applications. The sysadmin could possibly handle the latter unless you had very specific, known needs regarding software and software versions. Either way, it should all be documented, with approval processes in place so that the "guy down the hall" can't make changes behind people's backs and end up messing with the sysadmin's workflow and expectations. Once all the information is given to the sysadmin, he/she should be able to provide a more or less solid time estimate.

From what you've said, it sounds like this person isn't even testing the builds to see if everything works. In an ideal environment, a set of test scripts would be in place so that a build can be verified as correct or not by running said scripts. They would verify not only functionality but also whether or not the right software versions have been included (this includes system and application libraries). In larger environments, it's not uncommon to have an entire team devoted to performance testing, as well, so that once a server and its installed apps have been deployed, you can be sure that it will function and scale as well as, if not better than, in a lab or staging environment. That's another thing: a staging environment is key. You could have policies in place that require that servers transition from a lab environment to a staging environment and finally to a production environment.

I don't mind if a sysadmin takes time to carefully study things so that when a server is put into production, it works perfectly. I used to know a guy who did that. It wasn't that he was incompetent; rather, he was aware of the seriousness of failed deployments, and so he took a little extra time to make 100% sure that everything was kosher. His reputation so far is nearly impeccable, and I'd recommend him to any system administration team. However, repeated slip-ups on trivial tasks should raise orange (not yet red) flags. A basic sysadmin should know his operating systems and commonly used application libraries, so that when it comes time to build a system, there are very few questions in his/her mind about which OS to use and which libraries and applications to deploy. As far as a custom server build for a set of custom applications, it would take me about 1-2 days to get the base installation and configuration (plus performance tweaks, hardening, etc.) completed. After that, it would depend on what needs to get installed. The greater the number of software requirements, the more time it's going to take to build, install and test, and maybe that's what's holding up your sysadmin. I can't say that for sure, though, since you didn't provide enough information.

I hope that helps.

Michael