Configuration management for 'single server multiple admins'

We've set up a server that's running the infrastructure for a small association. So far, we've tried to manage the configuration with Ansible, but that has not been a great success. Perhaps we're doing it wrong.

In principle, the idea is that this server will be left alone most of the time, with people adding or changing things once in a blue moon. This makes it crucial that whatever is configured and running on the server is well-documented and clear, as people who do not admin the system frequently are bound to lose overview (let alone remember the details). Additionally, over time, the composition of the group of people who will admin this server will change (as people leave and join the 'committee').

We started out with a clean installation, adding roles in ansible whenever we wanted to set something up (nginx, phpfpm, postfix, firewall, sftp, munin, ..). Perhaps due to our inexperience, we're of course never able to type out a set of ansible tasks exactly the way we need it to be in one go, also because configuration is a bit of a trial and error process. That means that in practice, we would typically first configure whatever service we wanted to run on the server, and then translate to ansible tasks. You can see where this is going. People forget to then test the task, or are afraid to do so at the risk of breaking things, or worse: we forget or neglect to add things to ansible.

Today, we have very little confidence that the ansible configuration actually reflects what is configured on the server.

Currently I see three main problems:

It is hard to (read: we don't have a good way to) test ansible tasks without risking breaking things.
It adds extra work to first figure out the desired configuration, and then figure out how to translate this to ansible tasks.
(Ideally,) we do not use it frequently enough to build up familiarity and routine.

An important consideration here is that for whatever we end up doing, it should be easy for newcomers to learn the ropes without a ton of practice.

Is there a viable alternative that still provides some guarantees and checks (comparable to merging Ansible files to some master) that "configure things and write down what you did" fails to provide?

EDIT: We've considered committing /etc to git. Is there a reasonable way to protect secrets (private keys, etc) that way, but still have the configuration repository available outside the server somehow?

Solution 1:

Just spin up a test/staging VM that you can use to validate your changes. Your current method of performing changes manually first is hopelessly broken and doomed to failure. You and your team need to commit to using CM properly and part of that is having a test system available. Even just a local vagrant VM would be sufficient.

Not only will this help with testing new changes, but it will also serve as a test bed for new employees (or older employees who haven't used the system in a while) to familiarize themselves with your ansible setup.

Regarding keeping /etc/ in git: no, don't do this. That directory is only a tiny portion of what ansible is changing, and having git in place there will only encourage people to make local changes.

Keep your ansible playbooks in git. Consider restricting permissions such that only you can apply ansible changes to the live server. Others can submit pull requests with their changes, which you can review and merge into master if appropriate.

Solution 2:

Perhaps due to our inexperience, we're of course never able to type out a set of ansible tasks exactly the way we need it to be in one go, also because configuration is a bit of a trial and error process. That means that in practice, we would typically first configure whatever service we wanted to run on the server, and then translate to ansible tasks.

While there are other issues (like not having a testing environment), you can have a big improvement by not doing this.

One of Ansible's core design goals is to be idempotent, which means that running your playbook multiple times shouldn't change anything (unless you've changed the plays). Thus, when I'm configuring a new piece of software, my steps are:

Make changes to the Ansible tasks.
Run the playbook.
Examine the system, and if it's not correct, return to step 1.
Commit my changes.

If you don't think you'll write the correct thing the first time in Ansible, write it anyways and iterate on it until it's right, just like any other code. This greatly reduces the chance of forgetting to Ansiblize some change you made, since every change you made was already in Ansible at some point during your development process.

Configuration management for 'single server multiple admins'

Solution 1:

Solution 2:

Related

Recent Posts