How can this backup strategy work?
I would write a script that checks if a backup is more than 1,7 or 30 days old and acts accordingly. You have not said so but I assume you are using Linux (I added the linux tag to your question) and you are backing up to a remote server. The first step will be to write a little script that runs your rsync
command and also creates a file on the remote server when the backup is finished. This will be used both to tell whether a backup is currently running and to check the backup's age (I assume you are keeping the original timestamps when you backup files, so you can't get the date from the files themselves):
Rsync script (this assumes you have password-less access to the remote server):
#!/usr/bin/env bash
ssh user@remote rm /path/to/daily/backup/backup_finished.txt
rsync /path/to/source/ user@remote:/path/to/daily/backup/
ssh user@remote touch /path/to/daily/backup/backup_finished.txt
On the local machine, set up a cron job that does daily backups:
@daily rsync_script.sh
On the remote machine, you need to run the script I give below every few hours:
@hourly check_backup.sh
The check_backup.sh script:
#!/usr/bin/env bash
daily=/path/to/daily;
weekly=/path/to/weekly;
monthly=/path/to/monthly;
## The dates will be measured in seconds since the UNIX epoch,
## so we need to translate weeks and months (31 days) to seconds.
week=$((60*60*24*7));
month=$((60*60*24*31));
## Make sure no backup is currently running
if [ ! -e $daily/backup_finished.txt ]; then
echo "A backup seems to be running, exiting." && exit;
fi
## Get the necessary dates
weekly_backup_date=$(stat -c %Y $weekly/backup_finished.txt)
monthly_backup_date=$(stat -c %Y $monthly/backup_finished.txt)
now=$(date +%s)
monthly_backup_age=$((now - monthly_backup_date))
weekly_backup_age=$((now - weekly_backup_date))
## Check the age of the daily backup and copy it accordingly
if [[ "$monthly_backup_age" -gt "$month" ]]; then
## Copy unless the current $daily is identical to $weekly
diff $daily $weekly > /dev/null ||
## Delete the previous backup and copy the new one over
rm -rf $monthly && cp -rp $daily $monthly
fi
## Copy the weekly backup if it is older than a week but only
## if it is not identical to $monthly. The -r flag makes cp
## recursive and the -p flag makes it preserve dates and permissions.
if [[ "$weekly_backup_age" -gt "$week" ]]; then
## Copy unless the current $daily is identical to $monthly
diff $daily $monthly > /dev/null ||
rm -rf $weekly && cp -rp $daily $weekly
fi
So, this script (check_backup.sh
) will be run every hour on your backup server. Since it does nothing unless the backup is old enough, it's no problem to have it run so often. Now, every time a daily backup is older than 31 days, it will be copied to the monthly
directory and the contents of monthly
will be deleted. Similarly for weekly when the backup is more than 7 days old.
I am using diff
to compare the backups. This means that we will copy daily
to weekly
if the current weekly
is more than a week old but only if the backup that will be copied (the current daily
) is not the same as the existing weekly
and similarly for monthly
. For example, if the script has just run and it has seen that the monthly backup is the same as the current weekly one, it will not overwrite the existing monthly
. However, one week later when the weekly
will have changed, then it will copy the monthly
one.
The net result of this is that at any time you should have a minimum of two different backups and usually you will have three. The worst case scenario is that something fails and you don't have a week old backup, just a month old one or, vice versa, you don't have a month old one but you do have last week's.
This is more of a long comment, adding to what others have already pointed out.
First, use hardlinks and incremental backups with rsync to greatly reduce the amount of actual disk space used: each extra backup will only take up the size of the files that differ. If you are backing up large VM images, then I'd suggest to not backup the image files, but actually their filesystem contents (as @Michael already commented). A tool like rsnapshot should work fine, although (from experience) it is easy enough to roll a script of your own.
Then remove old backups, keeping older ones at increasingly longer intervals. I once wrote a program precisely to allow to configure this, it can be found here (called bu-rmselect
).