High Availability Cron Jobs

Information

We are currently in the process of creating a high availability cluster for NGINX (on Centos 7) running PHP. Most of the configuration has been mapped and it should work nicely in a clustered environment.

Unfortunately, the only thing that we cannot figure out to play nice with clustering is cron jobs (the cron jobs will execute PHP code). As far as I'm aware, cron jobs are executed on each host individually. This means that we either:

  1. Don't have a full high availability environment where upon single server failure, another server takes over and everything still works just as it was before (albeit slower).
  2. We run each cron job and save a result in a database to determine whether or not it has already been ran. This is not a viable solution as some of our cron jobs can take hours to run - and these need to be performed before the next working day.
  3. We find some sort of solution that enables for high availability cron jobs to be executed.

Research

Seeing as how solution 3 would help us maintain a high availability environment, that is the preferred method. Unfortunately, we are not very familiar with some of these solutions and so I seek your expertise with helping us find an appropriate solution for our needs. We are not very familiar with Linux machines (entire environment is Windows apart from NGINX servers) and know little about working with these machines (although we have been able to figure it out so far).

Options

  1. Dkron
    • This solution seems to offer simple setup and appears to be a decent product
  2. Chronos
    • This uses multiple other utilities to operate including an actual database (not ideal, but could work)
  3. Rundeck
    • Seems to offer a lot of functionality and potentially the best product on this list
  4. Rcron
    • I don't really know much about this except that it is Golang based.
  5. Custom script: How to make cronjobs high available?
    • This is an "if all else fails" approach if nothing else works...
  6. Other options??? - Please provide other options if you find some and I will include them here

Questions

  1. What are your expert opinions or recommendations for the different options?
  2. What are some of your experiences using the different options (pros/cons)?
  3. Which options would you consider we use with our infrastructure? (if additional information about our infrastructure is needed, please let me know)

Notes

Any help regarding this is greatly appreciated.

I realize this question has been asked before, although it seems quite outdated (2011) and many new solutions have since been created.


crond on RHEL/CentOS 7 includes clustering support. It is actually cronie, a fork of the venerable vixie-cron. Here are details from the man page:

CLUSTERING SUPPORT

In this version of Cron it is possible to use a network-mounted shared /var/spool/cron across a cluster of hosts and specify that only one of the hosts should run the crontab jobs in this directory at any one time. This is done by starting Cron with the -c option, and have the /var/spool/cron/.cron.hostname file contain just one line, which represents the hostname of whichever host in the cluster should run the jobs. If this file does not exist, or the hostname in it does not match that returned by gethostname(2), then all crontab files in this directory are ignored. This has no effect on cron jobs specified in the /etc/crontab file or on files in the /etc/cron.d directory. These files are always run and considered host-specific.

Rather than editing /var/spool/cron/.cron.hostname directly, use the -n option of crontab(1) to specify the host.

You should ensure that all hosts in a cluster, and the file server from which they mount the shared crontab directory, have closely synchronised clocks, e.g., using ntpd(8), otherwise the results will be very unpredictable.

In practice, this approach requires:

  • A shared filesystem mounted at /var/spool/cron for all clustered systems;
  • All clustered systems to begin crond with the -c flag (put CRONDARGS=-c in /etc/sysconfig/crond); and
  • Some kind of trigger so that, when the system responsible for cron jobs fails, another system will execute crontab -n to take over.

Keep in mind the warning: this solution only clusters cron jobs in /var/spool/cron (i.e., set with crontab -e). Every node will still run their individual jobs in /etc/crontab or /etc/cron.d.


Why not your option (2), but it creates a flag while it is executing. The cron jobs will start on all the machines, with slight local timing variations meaning one of them creates the flag first; the others then see the flag is set and bail out, while the first runs to completion.

You'll need to pay attention to the atomicity of the flag setting/checking (NFS is also an option here, with a lock file), although to keep that to a minimum there may also be some value in either

  • putting a small random sleep at the start of each cron job to splay them a bit, or
  • vary the start times of any given job by at least 1 minute between the servers i.e. server 1 starts the job at 7:02, server 2 at 7:03; usually server 1 will do the whole job, but if it's down, server 2 will see no flag when it starts at 7:03.

I use Jenkins to managed around 140 scheduled scripts.

Jenkins was not made to server as replacement to cron, is for continuous integration, but you can manager almost everything with him.

Here is some people who have success (just like me) moving the jobs from cron to Jenkins

Here a good comparative between Jenkins and cron