Automating duplicity backups using cron

Background

Original Reference: http://peterpetrakis.blogspot.com/2013/06/automating-and-encrypting-duplicity.html

Having suffered data loss in the past and hacking on storage suggests that it's a good idea to have regular backups. I wanted redundancy in case my local server failed and I wanted to encrypt my backups using a password protected gpg key.

The current solution uses a passphrase kept in plain text outside of the backup path. I plan to investigate moving the gpg key to a smartcard and using a pin key to unlock it instead. If anyone has any additional solutions please describe them in detail.

Persisting requisite environmental variables

Running anything from cron detaches it from your current environment, you lose all of the variables describing things like your ssh-agent gpg-agent, stuff you need to begin to communicate with the remote server.

I took a simple approach, in my ~.bashrc I created the following.

cat > ~/.backenvrc << EOF
# used by crontab backup script
export SSH_AGENT_PID=$SSH_AGENT_PID
export SSH_AUTH_SOCK=$SSH_AUTH_SOCK
export GPG_AGENT_INFO=$GPG_AGENT_INFO
export GPGKEY=XXX-insert-your-gpg-key-here-XXX
EOF

and simply source this from the backup script referenced in my crontab, I merely need only login once to populate this file.

Setting up the Crontab

# crontab -l
# m h  dom mon dow   command
MAILTO=ppetraki@localhost
BACKUP=/home/ppetraki/Documents/System/Backup
#
0 0  * * *      /usr/bin/crontab -l  > $BACKUP/crontab-backup
0 0  * * *      /usr/bin/dpkg --get-selections > $BACKUP/installed-software
0 0  * * *      /usr/local/bin/ppetraki-backup.sh inc
0 0  * * Fri    /usr/local/bin/ppetraki-backup.sh full

Note that I am also backing up my crontab and my list of installed software, eventually I will move this into another script that also does things like

1) backup my bookmarks from chrome and firefox

2) backup mail in a non-binary format

The current cron format performs an incremental backup every night and a full backup every Friday.

Driver script

This wraps the invocation of duplicity and acquires the necessary environmental variables. Duplicity itself can be hairy with all the command line switches and even more of a burden if you have multiple targets. I have redundant backups, first to a local server and to a remote service provided by rsync.net (great customer support!). I found horcrux to be a wonderful, lightweight, duplicity wrapper to suit my needs.

The driver script, which is external to my backup path, also contains my GPG passphrase to encrypt my backups. Eventually I wish to move to a smartcard driven system [illustrated here] (http://blog.josefsson.org/2011/10/11/unattended-ssh-with-smartcard/)

[/usr/local/bin/ppetraki-backup.sh]

#!/bin/bash

export PATH=$PATH:/usr/local/bin
action=$1

export USER=XXX
export HOME=/home/$USER

source $HOME/.backenvrc

echo "verifying environment"
echo "gpg-agent: ${GPG_AGENT_INFO}"
echo "gpg-key:   ${GPGKEY}"
echo "ssh-agent-pid:   ${SSH_AGENT_PID}"
echo "ssh-auth-sock:   ${SSH_AUTH_SOCK}"

if [ -z $action ]; then
  echo "requires an action!"
  exit 1
fi

export PASSPHRASE=

[ -z $PASSPHRASE ] && exit 1

echo "begin"

for config in local_backup remote_backup
do
  horcrux clean   $config
  horcrux $action $config
done

Using horcrux to wrangle duplicity

Horcrux has the notion of profiles that takes all the complexity out of managing the duplicity CLI. Here's an example of a profile.

cat /home/ppetraki/.horcrux/local_backup-config
destination_path="rsync://192.168.1.XXX/backups/personal"
 cat ~/.horcrux/local_backup-exclude
- /home/ppetraki/Sandbox
- /home/ppetraki/Bugs
- /home/ppetraki/Downloads
- /home/ppetraki/Videos
- /home/ppetraki/.xsession-errors
- /home/ppetraki/.thumbnails
- /home/ppetraki/.local
- /home/ppetraki/.gvfs
- /home/ppetraki/.systemtap
- /home/ppetraki/.adobe/Flash_Player/AssetCache
- /home/ppetraki/.thunderbird
- /home/ppetraki/.mozilla
- /home/ppetraki/.config/google-googletalkplugin
- /home/ppetraki/.config/google-chrome
- /home/ppetraki/.cache
- /home/ppetraki/**[cC]ache*

I found it problematic to backup only sub directories of things like mozilla and google-chrome, instead I will write an additional script to cherry pick those files for backup.

The main horcrux config file

cat ~/.horcrux/horcrux.conf 
source="/home/ppetraki/"          # Ensure trailing slash
encrypt_key=XXXXXX     # Public key ID to encrypt backups with
sign_key='-'             # Key ID to sign backups with (leave as '-' for no signing)

use_agent=false          # Use gpg-agent?
remove_n=3               # Number of full filesets to remove
verbosity=5              # Logs all the file changes (see duplicity man page)
vol_size=25              # Split the backup into 25MB volumes
full_if_old=30D         # Cause 'full' operation to perform a full
                         # backup if older than 360 days
backup_basename='backup' # Directory name for local backups (i.e., destination
                         # /Volumes/my_drive/backup/ or /media/my_drive/backup/)
dup_params='--use-agent' # Parameters to pass to Duplicity

This is great as it reduces a backup invocation to this:

 $ horcrux inc local_backup 

Monitoring

I defined MAILTO in my crontab and also installed mutt and the reconfigured postfix for local mail delivery. Every night I get a progress report on how the backups ran.

Conclusion

I've spent quite a bit of time determining how to automate this in and provide strong encryption. I hope you find this useful.