Simple Backup Solution

I'm looking for a very basic backup script/package for a directory on my Ubuntu server. Currently I'm using a cronjob like this:

0 5 * * 1 sudo tar -Pzcf /var/backups/home.tgz /home/

But I want a solution which adds a timestamp to the filename and does not override old backups. Of course this will slowly flood my drive so old backups (e.g. older than 2 months) need to be deleted automatically.

Cheers, Dennis


UPDATE: I've decided to give the bounty to the logrotate-solution because of it's simplicity. But big thanks to all other answerers, too!


Solution 1:

Simple solution using logrotate

If you want to keep it simple and without scripting, just stay with your current cronjob and in addition configure a logrotate rule for it.

To do that, place the following in a file named /etc/logrotate.d/backup-home:

/var/backups/home.tgz {
    weekly
    rotate 8
    nocompress
    dateext
}

From now on, each time logrotate runs (and it will normally do so every day at ~6:25am), it will check if it's suitable for rotation and, if so, rename your home.tgz to another file with a timestamp added. It will keep 8 copies of it, so you have roughly two months of history.

You can customize the timestamp using the dateformat option, see logrotate(8).

Because your backup job runs at 5am and logrotate runs at 6:25am you should make sure your tar backup runs well under 1h and 25m (I guess it will be much faster anyway).

Solution 2:

This is (a variant of) the script I use (/home/pduck/bup.sh):

#!/usr/bin/env bash

src_dir=/home/pduck
tgt_dir=/tmp/my-backups
mkdir -p $tgt_dir

# current backup directory, e.g. "2017-04-29T13:04:50";
now=$(date +%FT%H:%M:%S) 

# previous backup directory
prev=$(ls $tgt_dir | grep -e '^....-..-..T..:..:..$' | tail -1); 

if [ -z "$prev" ]; then
    # initial backup
    rsync -av --delete $src_dir $tgt_dir/$now/
else
    # incremental backup
    rsync -av --delete --link-dest=$tgt_dir/$prev/ $src_dir $tgt_dir/$now/
fi

exit 0;

It uses rsync to locally copy the files from my home directory to a backup location, /tmp/my-backups in my case. Below that target directory a directory with the current timestamp is created, e.g. /tmp/my-backups/2018-04-29T12:49:42 and below that directory the backup of that day is placed.

When the script is run once again, then it notices that there is already a directory /tmp/my-backups/2018-04-29T12:49:42 (it picks the "latest" directory that matches the timestamp pattern). It then executes the rsync command but this time with the --link-dest=/tmp/my-backups/2018-04-29T12:49:42/ switch to point to the previous backup.

This is the actual point of making incremental backups:

With --link-dest=… rsync does not copy files that were unchanged compared to the files in the link-dest directory. Instead it just creates hardlinks between the current and the previous files.

When you run this script 10 times, you get 10 directories with the various timestamps and each holds a snapshot of the files at that time. You can browse the directories and restore the files you want.

Housekeeping is also very easy: Just rm -rf the timestamp directory you don't want to keep. This will not remove older or newer or unchanged files, just remove (decrement) the hardlinks. For example, if you have three generations:

  • /tmp/my-backups/2018-04-29T...
  • /tmp/my-backups/2018-04-30T...
  • /tmp/my-backups/2018-05-01T...

and delete the 2nd directory, then you just loose the snapshot of that day but the files are still in either the 1st or the 3rd directory (or both).

I've put a cronjob in /etc/cron.daily that reads:

#!/bin/sh
/usr/bin/systemd-cat -t backupscript -p info /home/pduck/bup.sh

Name that file backup or something, chmod +x it, but omit the .sh suffix (it won't be run then). Due to /usr/bin/systemd-cat -t backupscript -p info you can watch the progress via journalctl -t backupscript.

Note that this rsync solution requires the target directory to be on an ext4 filesystem because of the hardlinks.

Solution 3:

With a little edit to your cron command you can add a timestamp to the filename:

0 5 * * 1 sudo tar -Pzcf /var/backups/home_$(date "+%Y-%m-%d_%H-%M-%S").tgz /home/

And as for the cleaning I found an awesome one-line script here that I adapted to your case:

find . -type f -name 'home_*.tgz' -exec sh -c 'bcp="${1%_*}"; bcp="${bcp#*_}"; [ "$bcp" "<" "$(date +%F -d "60 days ago")" ] && rm "$1"' 0 {} \;

You can add the above command to another cron job and it will remove backups older than 60 days. HTH

Solution 4:

Here is part of a solution from my daily backup script which is called by cron: Backup Linux configuration, scripts and documents to Gmail. The full script is in appropriate because:

  • it includes targeted /home/me/* files but skips 1 GB of /home/ files important to you used by FireFox, Chrome and other apps which I have no interest in backing up.
  • it includes important files to me but unimportant to you in /etc/cron*, /etc/system*, /lib/systemd/system-sleep, /etc/rc.local, /boot/grub, /usr/share/plymouth, /etc/apt/trusted.gpg, etc.
  • it emails the backup every morning to my gmail.com account for off-site backups. Your backups are not only on-site but also on the same machine.

Here is the relevant script, parts of the which you might adapt:

#!/bin/sh
#
# NAME: daily-backup
# DESC: A .tar backup file is created, emailed and removed.
# DATE: Nov 25, 2017.
# CALL: WSL or Ubuntu calls from /etc/cron.daily/daily-backup
# PARM: No parameters but /etc/ssmtp/ssmtp.conf must be setup

# NOTE: Backup file name contains machine name + Distro
#       Same script for user with multiple dual boot laptops
#       Single machine should remove $HOSTNAME from name
#       Single distribution should remove $Distro

sleep 30 # Wait 30 seconds after boot

# Running under WSL (Windows Subsystem for Ubuntu)?
if cat /proc/version | grep Microsoft; then
    Distro="WSL"
else
    Distro="Ubuntu"
fi

today=$( date +%Y-%m-%d-%A )
/mnt/e/bin/daily-backup.sh Daily-$(hostname)-$Distro-backup-$today

My gmail.com is only 35% full (out of 15 GB) so my daily backups can run for awhile more before I have to delete files. But rather than an "everything older than xxx" philosophy I'll use a grandfather-father-son strategy as outlined here: Is it necessary to keep records of my backups?. In summary:

  • Monday to Sunday (Daily backups) that get purged after 14 days
  • Sunday backups (Weekly backups) purged after 8 weeks
  • Last day of month backups (Monthly backups) purged after 18 months
  • Last day of year backups (Yearly backups) kept forever

My purging process will be complicated by the fact I'll have to learn Python and install a Python library to manage gmail folders.

If you don't want generational backups and want to purge files older than 2 months this answer will help: Find not removing files in folders through bash script.

In summary:

DAYS_TO_KEEP=60
find $BACKUP_DIR -maxdepth 1 -mtime +"$DAYS_TO_KEEP" -exec rm -rf {} \;