How to manually extract a backup set made by duplicity?

I have a set of files on my webserver produced by duplicity software:

  • duplicity-full.20110315T085334Z.vol1.difftar.gz
  • duplicity-full.20110315T085334Z.vol2.difftar.gz
  • duplicity-full.20110315T085334Z.vol3.difftar.gz
  • etc... (50 files, total size about 1 Gb)

Backup has been made without encryption.

My current hoster haven't duplicity on his server and don't want to install it. How can I unpack these files using remote SSH access? Maybe there is some bash-script available to do that?


How about download the required archive and then make like this: duplicity scp://[email protected]//usr/backup restored_dir (example from official site)


In case anyone else comes across this (as I just have) there are some reasonably detailed (and mostly correct) steps over here.

Key details

The key point is to unpack all of the duplicity-full.*.difftar.gz files in the same place, so that you're left with just two snapshot/ and multivol_snapshot/ directories.

If your file is in snapshot/ then you're done. Otherwise find the directory in multivol_snapshot/ at the path where your file used to be: you need to join together all the files in this directory to recreate the original file. The files are numbered, and can be joined together using the cat command. Depending on how large the original was, there may be many parts.

Problem with original instructions

The directions linked above suggest using cat * > rescued-file. Unfortunately this simple approach fails if you have more than 9 parts. Since * expands in dictionary order, not numeric order, 10 would be listed before 2, and the file would be reconstructed in the wrong order.

Workaround

One simple approach is to remember that dictionary order does work when numbers are the same length, and that ? matches a single character. So if your largest file has three digits, you can manually enter:

cat ? ?? ??? > rescued-file

Add or remove ? patterns as necessary, depending on the largest file number.

Script

If you have a lot of files to recover and don't fancy typing that for all of them, you might prefer to use a script such as this. It lists the containing directory for every file, removes duplicates from the list, then goes to each directory and creates a content file from the fragments there. (spacer is just to make $1 work.)

find multivol_snapshot/ -type f -printf '%h\0' | \
  sort -uz | \
  xargs -0 -n 1 sh -c 'cd "$1" ; cat $(ls | sort -n) > content' spacer

Now you just have to add /content to the end of any filename you were looking for, and you should find it.

Limitations

This doesn't restore any of the original file permissions or ownership. It also doesn't deal with incremental backups, but then the linked instructions also hit a bit of a dead end on this point — they just suggest using rdiff 'to stitch the files together' and refer the reader to man rdiff.