SSH access gateway for many servers

Managing multiple servers, in excess of 90 currently with 3 devops via Ansible. All is working great, however there is a giant security problem right now. Each devop is using their own local ssh key to gain access directly to the servers. Each devop uses a laptop, and each laptop potentially could be be compromised thus opening the entire network of prod servers up to an attack.

I am looking for a solution to centrally manage access, and thus block access for any given key. Not dissimilar to how keys are added to bitbucket or github.

Off the top of my head I would assume the solution would be a tunnel from one machine, the gateway, to the desired prod server... while passing the gateway the request would pick up a new key and use to gain access to the prod server. The result would be we can quickly and efficiently kill access for any devop within seconds by just denying access to the gateway.

enter image description here

Is this good logic? Has anyone seen a solution out there already to thwart this problem?


Solution 1:

That's too complicated (checking if a key has access to a specific prod server). Use the gateway server as jump host that accepts every valid key (but can easily remove access for a specific key which removes access to all servers in turn) and then add only the allowed keys to each respective server. After that, make sure you can reach the SSH port of every server only via the jump host.

This is the standard approach.

Solution 2:

Engineers should not be running ansible directly from their laptop, unless this is a dev/test environment.

Instead, have a central server that pulls the runbooks from git. This allows for additional controls (four eyes, code review).

Combine this with a bastion or jump-host to restrict access further.

Solution 3:

Netflix implemented your setup and released some free software to help that situation.

See this video https://www.oreilly.com/learning/how-netflix-gives-all-its-engineers-ssh-access or this presentation at https://speakerdeck.com/rlewis/how-netflix-gives-all-its-engineers-ssh-access-to-instances-running-in-production with the core point:

We’ll review our SSH bastion architecture, which at its core uses SSO to authenticate engineers, and then issues per user credentials with short lived certificates for SSH authentication of the bastion to an instance. These short lived credentials reduce the risk associated them being lost. We’ll cover how this approach allows us to audit and automatically alert after the fact, instead of slowing down engineers before granting access.

Their software is available here: https://github.com/Netflix/bless

Some interesting take aways even if you do not implement their whole solution:

  • they use SSH certificates instead of just keys; you can put far more meta-data in the certificate, hence enabling a lot of constraints per requirements and also allowing simpler audits
  • using very short term (like 5 minutes) certificates validity (the SSH sessions stay open even after the certificate expires)
  • using 2FA to also make scripting difficult and force developers to find other solutions
  • a specific submodule, outside of their infrastructure and properly secured through the security mechanisms offered by the cloud where it runs, handles generating certificates dynamically so that each developer can access any host

Solution 4:

OneIdentity (ex-Balabit) SPS is the exact thing you need in this scenario. With this appliance you can manage the user identities on basically any machines, track user behavior, monitor and alert, and index whatever the users doing for later reviews.