What is best practice for managing login / sudo accounts on multiple systems, with multiple users, on a ship?

I want to preface this by saying I'm not a sysadmin by trade, I've fallen into the jack-of-all-IT-trades in a team of instrumentation engineers.

My organisation has traditionally used the same password on all systems (one for root, one for user) and it's worked fine for them because we're small, and disconnected from the internet most of the time. We've just got ourselves a new facility, they've decided to up the security game - which as far as they're concerned means no DHCP assignment for unknown MACs, and unique passwords for each system (stored in a password manager).

This sounds like a rubbish solution to me. DHCP requirement is bypassed by assigning a static IP, and password database just means we have to jump through more hoops to do anything, but are still exposed by a single password.

The environment we're working in has ~10 CentOS servers and a smattering of OEM workstations. Mostly Windows I think. We're pretty well isolated by VLANs and a very marginal internet connection. Techs on the vessel occasionally need sudo access to any of the systems.

Is this a solved problem? I don't particularly know where to start. I'm going to start looking into SSH keys as per What are best practices for managing SSH keys in a team?

but if anyone can point me to best practice, I would be eternally grateful.


Solution 1:

It is less of a solved problem than an ongoing process towards better practices. Many organizations have found good enough solutions for their specific requirements, some more internet-connected ones even with ready-made commercial offers. All of them still have unresolved pain in this area.

Most of the time improvements in making the process easier for everyone involved have also netted the most reliable security improvement (easier management -> less stale/incorrect configuration -> less surface for malicious use).

Since your question is not overly specific about operational constraints and measurable goals, I'll start with some very broad suggestions:

  • put much effort into ensuring that changes to the "who has access" list can appropriately be acted on - it does not matter if you manage to secure it now, if you do not fundamentally improve the way you could now deal with compromised credentials
  • try to do away with most or all password auth on any multi-user-systems, and instead have all authentication be done from personal tablets/laptops via non-replayable auth such as ssh keys
  • carefully investigate whether any of your next steps nudge make it harder to do things in a secure manner and this nudge people into adopting unsafe procedures
  • carefully investigate whether your changes not only touch day-to-day operations, but also complicate emergency recovery tasks
  • more granular access (does everyone really need access to all systems, or could they be subdivided into layers or groups?)