How to keep: daily backups for a week, weekly for a month, monthly for a year, and yearly after that

I need to backup data and config files on this server, daily. I need to keep:

  • daily backups for a week
  • weekly backups for a month
  • monthly backups for a year
  • yearly backups after that

All of this accomplished via a shell script run daily from cron.

This is how the backup files should look after 10 years of running:

blog-20050103.tar.bz2
blog-20060102.tar.bz2
blog-20070101.tar.bz2
blog-20080107.tar.bz2
blog-20090105.tar.bz2
blog-20100104.tar.bz2
blog-20110103.tar.bz2
blog-20120102.tar.bz2
blog-20130107.tar.bz2
blog-20130902.tar.bz2
blog-20131007.tar.bz2
blog-20131104.tar.bz2
blog-20131202.tar.bz2
blog-20140106.tar.bz2
blog-20140203.tar.bz2
blog-20140303.tar.bz2
blog-20140407.tar.bz2
blog-20140505.tar.bz2
blog-20140602.tar.bz2
blog-20140707.tar.bz2
blog-20140728.tar.bz2
blog-20140804.tar.bz2
blog-20140811.tar.bz2
blog-20140816.tar.bz2
blog-20140817.tar.bz2
blog-20140818.tar.bz2
blog-20140819.tar.bz2
blog-20140820.tar.bz2
blog-20140821.tar.bz2
blog-20140822.tar.bz2

Solution 1:

You are seriously over-engineering this. Badly.

Here's some pseudocode:

  • Every day:
    • make a backup, put into daily directory
    • remove everything but the last 7 daily backups
  • Every week:
    • make a backup, put into weekly directory
    • remove everything but the last 5 weekly backups
  • Every month:
    • make a backup, put into monthly directory
    • remove everything but the last 12 monthly backups
  • Every year:
    • make a backup, put into yearly directory

The amount of logic you have to implement is about the same, eh? KISS.

This looks easier:

s3cmd ls s3://backup-bucket/daily/ | \
    awk '$1 < "'$(date +%F -d '1 week ago')'" {print $4;}' | \
    xargs --no-run-if-empty s3cmd del

Or, by file count instead of age:

s3cmd ls s3://backup-bucket/daily/ | \
    awk '$1 != "DIR"' | \
    sort -r | \
    awk 'NR > 7 {print $4;}' | \
    xargs --no-run-if-empty s3cmd del

Solution 2:

If you just want to keep, for example, 8 daily backups and 5 weekly (every sunday) backups, it works like this:

for i in {0..7}; do ((keep[$(date +%Y%m%d -d "-$i day")]++)); done
for i in {0..4}; do ((keep[$(date +%Y%m%d -d "sunday-$((i+1)) week")]++)); done
echo ${!keep[@]}

As of today (2014-11-10), this will output:

20141012 20141019 20141026 20141102 20141103 20141104
20141105 20141106 20141107 20141108 20141109 20141110

As an exercise left for you, you just have to delete all backup files whose names do not appear in the keep-array.

If you want to keep 13 monthly backups (first sunday of every month) and 6 yearly backups (first sunday of every year) as well, things get a little bit more complicated:

for i in {0..7}; do ((keep[$(date +%Y%m%d -d "-$i day")]++)); done
for i in {0..4}; do ((keep[$(date +%Y%m%d -d "sunday-$((i+1)) week")]++)); done
for i in {0..12}; do
        DW=$(($(date +%-W)-$(date -d $(date -d "$(date +%Y-%m-15) -$i month" +%Y-%m-01) +%-W)))
        for (( AY=$(date -d "$(date +%Y-%m-15) -$i month" +%Y); AY < $(date +%Y); AY++ )); do
                ((DW+=$(date -d $AY-12-31 +%W)))
        done
        ((keep[$(date +%Y%m%d -d "sunday-$DW weeks")]++))
done
for i in {0..5}; do
        DW=$(date +%-W)
        for (( AY=$(($(date +%Y)-i)); AY < $(date +%Y); AY++ )); do
                ((DW+=$(date -d $AY-12-31 +%W)))
        done
        ((keep[$(date +%Y%m%d -d "sunday-$DW weeks")]++))
done
echo ${!keep[@]}

As of today (2014-11-10), this will output:

20090104 20100103 20110102 20120101 20130106 20131103
20131201 20140105 20140202 20140302 20140406 20140504
20140601 20140706 20140803 20140907 20141005 20141012
20141019 20141026 20141102 20141103 20141104 20141105
20141106 20141107 20141108 20141109 20141110

Same as above, just delete all backup files not found in this array.

Solution 3:

I recently had the same problem. IMHO, trying to write a shell script to do it is painful, and it is much easier to write some reusable logic using a higher-level language with builtins like sets, dictionaries, etc. The general idea is to take configuration indicating how many files of each period you want to keep, and then decide for each file if it should be kept.

There is a fairly popular python-based script that looks really nice and has some easy-to-understand source. Plus being python-based instead of shell-based gives it a cross-platform advantage: https://github.com/xolox/python-rotate-backups