What are the pros/cons of running Chef/Puppet at regular intervals?

I always worked where they used to run Puppet at regular intervals. So, distributing changes was easy and on the fly. In the new team, they frown upon running the Chef agent at regular intervals. They only use it to bootstrap the OS and then kill it. I don't understand why would anyone use a config management tool like Chef without having to run it regularly. Whatever bootstrapping we are doing could be done via basic shell scripts - Install xyz software, copy config file, restart the service.

They say that its too dangerous running it at regular intervals in production as they aren't sure if the code is idempotent.

My queries are:

  • How many of you use the Orchestration tools just for bootstrapping? Isn't it like driving a Bugatti at 20mph in alleys?
  • Are there any problems you see in running this at regular intervals, when you scale up? How would you handle it? (One way I know is to run the agent's in solo mode and let them download the cookbooks from some repository/artifactory that can handle simultaneous multiple downloads, rather than overwhelm the Puppet/Chef server).
  • How can I encourage the team to fix the code to being idempotent and run the agent at regular intervals? Or move away from Chef to something simple as bash to reduce the overhead of maintaining/writing the code.
  • Am I right in saying, that we are not using the tools the way they are supposed to be used?
  • Am I missing/overlooking anything here?

Solution 1:

orchestration for bootstrapping

There are tools like Terraform which are actually focused on this part of the process. I also use ansible for some ad-hoc tasks that don't need to be rerun often.

Generally though, it is a best practice to run your config management at least every hour. Granting or removing access often happens via these mechanisms and delaying updates can cause compliance or usability issues. At one large shop we split puppet into two so the app-specific stuff could be paused without breaking the "shadow puppet" which handled access control updates and "couldn't" be cut off.

problems from running regularly

If you write bad recipes then you can destroy all of production very quickly. Having some process where roles are released into QA and validated before going to staging and being re-validated before going to prod. Chef has built-in testing mechanisms. Similar techniques can be used with the others.

how to encourage running it regularly

I would first focus on the problems that are being brushed under the carpet. If you don't run your recipes often then you won't notice when they start to not work because of changes to the OS or your apps.

Then I would mention that changes can be made everywhere pretty quickly when needed. Your interval between chef runs should be the maximum amount of time you're willing to wait for a change to propagate throughout your environment.

are you right?

Mostly. If it works well enough for them they may not see any need to change anything. You might need to come up with a demo to show the value and make it real for people. Or you might need to wait for your organization to mature to the point that it can handle what you're teaching.

what are you missing?

The main thing you don't seem to be considering is the possible performance impact. If the app is really sensitive to things running in the background then you can see lower throughput or higher latency while chef runs. If this is the case you'll need to adjust your recipes or only let it run at off-peak times.

Another thing I've seen happen is memory exhaustion. The app gradually chews up memory until chef can no longer function. Hopefully you have monitoring of memory levels and whether chef is working or not to catch this sort of thing.

Beyond performance and memory I'd suggest reading a book like Release It that explains a lot about how to build reliable production systems.