Why do people use Puppet/Chef with Amazon Cloud Formation instead of just using CloudInit?
We're planning to use AMI EC2 instances which are not "pre-baked". I.e. when they are spun up, they are bare installs of AWS linux. Our bootstrap process will pull in the various installs that we need e.g. python, tomcat. We'll have min of 3 instances and max of 8.
Given these requirements, would using Puppet/Chef be useful rather than using Amazon Cloud Formation (CloudInit)?
Best I can see is if we used Puppet, then we'd have declarative programming which is easier to audit to see what's happening versus a script. Also CloudInit has a 16k script size limit which we may or may not run into.
Has anyone moved from CloudInit to Puppet or Chef for a specific reason that they can provide here in answer to my question?
Solution 1:
Is there an advantage over CloudInit? Yes, absolutely, many of them!
Sure, you can write top to bottom run once CloudInit scripts to provision a server. But what happens when you need to change a configuration file, add a user, update a package, or install a new package? You will end up logging into servers or writing scripts to do so, and inevitably an incongruous state of servers.
CloudInit is not configuration management. If you opt to begin using configuration management software, use cloud init for just one task: to bootsrap the Puppet/Chef/other agent.
Puppet doesn't just help you automate installing packages, setup ssh keys, or tune your Tomcat heap. It ensures the state of things. When a developer is troubleshooting a Java app at 3am and changes your Tomcat config, Puppet will change it back. You can rapidly change the version of Python for all or groups of nodes, and if someone installs a different version, Puppet will change it back.
When your application stack changes and you start using, say RabbitMQ, or Jetty, or a new RDBMS, you can easily test and deploy the changes across tens or thousands of servers.
There are many other reasons to use configuration management software such as back end reporting, auditing, and security compliance.
Solution 2:
The entire point of configuration management is to spin up machines predictably and consistently. CloudFormation and cloudinit are great when you're limited purely to AWS (although debugging CloudFormation templates is a miserable experience), but what about applications that use both datacenter resources and AWS, or local testing environments, or development machines?
If you exist purely in AWS, I suppose you could get away with cloudinit and nothing else, but I'm not convinced it's realistic for applications of any scale (Netflix, for example, pre-bakes their AMIs using OSS technologies they've written and released to the world; consider this video for details). Highly available applications are trans-regional, often based in VPCs, tend to back up to datacenters across VPNs, and this doesn't even touch demo, staging, testing, or development environments. As someone who's charged with provisioning machines the last things I want to do are repeat work or get stuck debugging multiple provisioning methods.
Hence Chef or Puppet. They work just as well for AWS as they do for my datacenter, and just as well for my development machine running Vagrant as they do for the demo environments I occasionally need on the fly. I'd much rather launch Chef or Puppet from cloudinit than maintain both cloudinit and Chef or Puppet.