Puppet: managing (lots of) Apache VirtualHosts

I'm learning my way through configuration management in general and using puppet to implement it in particular. I have already done some generic research (also on SF) and right now I'm considering Apache VirtualHosts.

We host a lot of LAMP websites (it's currently in the hundreds range) on two systems: an Apache2/mod_php one and a MySQL one - basically the opposite of another question already on SF where he manages lots of servers with few vhosts each (if not actually one, I don't know). I have not yet put together a working config in puppet but it shouldn't be a problem, there are many examples and recipes out there.

In addition to the obvious apache configuration file(s) (no problem here I guess) every vhost would need to have some directories created and permissions checked (eg. a root dir for each vhost containing a documentroot, a dedicated tmp dir, a dedicated php session files dir, possibly SSL certificates, and so on) on the webserver, and a user + one or more databases on the MySQL server.

Adding a new vhost would require puppet to create those, removing one would require puppet to run some script that will backup user data and then remove the live data from the two servers, but also each and every puppet agent run would then check the existence of the directories, the db, permissions, etc.

Am I asking for trouble when going up to hundreds of virtualhost with all those checks running at every puppet run, especially the filesystem ones (on the webserver), and especially when in the future the systems will be loaded up more? (let's say we target the 1000~2000 websites range as a reasonable per-server maximum).

Is there any experience in doing that out there on the net? I googled but found nothing, also because there is a low signal/noise ratio when searching for "puppet" and "apache"...


Solution 1:

I suspect that managing a lot of apache virtual hosts won't be a problem, but I can't say for sure. The acceptable performance is defined by your business needs. Only you can decide if it's fast enough. Here is a decent thread about reducing CPU load: https://groups.google.com/forum/?fromgroups#!topic/puppet-users/sxtMvCnKnys[1-25]

To summarize the thread:

  • Increase the delay between puppet agent runs
  • don't schedule puppet and only use puppet kick or mcollective to trigger runs
  • schedule the Apache changes to only happen at certain times.
  • use two different environments (maintenance and production) to manage things. Keep production lightweight and use maintenance to making changes.

Here is an example of managing an apache Virtual Host from the PuppetLabs web site: http://docs.puppetlabs.com/learning/definedtypes.html#an-example-apache-vhosts

Setting up and removing the configuration shouldn't be a problem. The biggest problem would be removing data files for the web applications/sites. For that, I would recommend shared storage, like NFS/AFS. If you're not using shared storage, then make sure the user-generated data is left intact, backed up, or migrated to the new server.

I suspect that you're in a mass-hosting situation, like a web hosting company, so I recommend that site individual site names not be encoded into your puppet manifest. For this, I recommend using Hiera < http://puppetlabs.com/blog/first-look-installing-and-using-hiera/ . Hiera allows you to use a separate way to store the list of virtual host to real servers mappings. You can use flat files or a database with Hiera. Sadly, I don't know Hiera enough to guide you on how to set up the multi-level Hiera data structture that you might need, but I can at least point you in the general direction of Hiera.