Configuration Management tools failover behaviour

Solution 1:

I will comment only on the ones i have experience with, that means Puppet and Ansible. And I'm omitting some details.

Both can be setup to run agentless or local only if needed. To use them local only you obviously need some way to transfer the needed manifests / playbooks to the target machines and run them there.

Talking about Puppet usage with masters, you can have redundancy using a load balancer with the actual masters behind.

In Ansible instead there is no master concept, each machine that can connect to the managed machines with ssh / powershell can do, provided you have a way to access the playbooks. Maybe you meant Ansible Tower, which uses a DB for it's operation, and you can cluster it if needed.

This brings us at the real redundancy in both cases, that is the actual scripts. In nearly all cases i have seen those stay at a git repository, so it's inherently redundant, just cloning it and you can have how much "master" copies as you wish.

Hope this helps.

Solution 2:

If you look at salt, the only information that makes up a working connection between master and minions are:

  • the fact that the minion can resolve the master ip somehow
  • the minions public keys in the /etc/salt/pki/master directories

If your salt master dies, the systems will keep on running with no effect. But you are right, you cannot run any changes to your configurations while the master is gone. So a question is how fast can you get it back?

You can simply reinstall the master and start it up - you can accept your minions keys again (or reinstall an potential backup) and you are at the same place where you left off with your old master. If you cannot reuse the same machine, than you would need to point the minions to the new master somehow.

No state data in a database that might be corrupted or gone. That for me is the beauty of it. Its an overlay, it does not squeeze in. Not - as an other way example - like juju, where when your database is gone your systems act like they are beheaded and you have to reinstall.

There is also Multimaster and Syndic in Salt - High Availability is a long standing topic in its development.