How can I substitute colons when I rsync on a USB key?

I would like to backup my mail directory on a USB key. However, my IMAP has a strange naming convention that something include a colon (:) character. Since the USB is in a window format, rsync fail to create those file. Is there a way to replace the colon character by an underscore when running rsync? (Or to do the same synchronization with another tool?)

Just a few point that I clarified in the comments :

  • This is a worst case scenario backup, I would like to be able to read it on a windows machine without installing anything.
  • I got a lot of data that stay constant. So I save a lot of time if I have a tool that just copy the newer files.
  • I am not looking for a rewrite of rsync. I am looking for a existing tool that can be used out of the box.

Thanks


Use rdiff-backup instead of plain rsync. It will automatically detect and substitute for characters that aren't supported on the destination disk, and also put them back as they were when you restore to a unix filesystem. It produces an unpacked directory that looks just like the origin plus one extra metadata directory.


The most straightforward approach is to leverage the filesystem layer to transform the file names. Since Ubuntu 12.04, there is a FUSE filesystem that transforms file names into names that Windows's VFAT supports: fuse-posixovl Install fuse-posixovl.

sudo mount.posixovl /media/sdb1
chown guillaume /media/sdb1
rsync -au ~/mail /media/sbd1/

Or to avoid requiring root access:

mkdir ~/mnt
/sbin/mount.posixovl -S /media/sdb1 ~/mnt
rsync -au ~/mail ~/mnt/

Characters in file names that VFAT doesn't accept are encoded as %(XX) where XX are hexadecimal digits. As of POSIXovl 1.2.20120215, beware that a file name like %(3A) is encoded as itself, and will be decoded as :, so there is a risk of collision if you have file names containing substrings of the form %(XX).

Beware that POSIXovl does not cope with file names that are too long. If the encoded name doesn't fit in 255 characters, the file can't be stored.

POSIXovl stores unix permissions and ownership in files called .pxovl.FILENAME.


The following bash ≥4 script copies ~/mail/foo:bar to /media/usb99/mail/foo_bar, and similarly for all files under ~/mail. Files that already exist in the destination tree and that are not older than the source are skipped.

#!/bin/bash
set -e
shopt -s dotglob globstar
for source in "$HOME"/mail/**/*; do
  target=/media/usb99/${source#"$HOME"/}
  target=${target//:/_}
  if [[ -d $source ]]; then
    mkdir -p -- "$target"
  elif [[ $target -ot $source ]]; then
    cp -p -- "$source" "$target"
  fi
done

This script works under zsh with minor modifications: replace shopt -s dotglob globstar by setopt dot_glob and [[ $target -ot $source ]] by [[ ! -e $target || $target -ot $source ]].


Here's a zsh two-liner (three if you count the autoloads). It's shorter, but fairly advanced and not very readable.

autoload zargs zmv
zargs -- ~/mail/**/*(/e\''REPLY=/media/usb99/${${REPLY#$HOME/}//:/_}'\') -- mkdir -p --
zmv -C -Q -o -pu '~/mail/(**/)(*)(.)' '/media/usb99/mail/${1//:/_}${2//:/_}'
  • The zargs line is equivalent to mkdir -p ~/mail/**/*(…), except that it won't bomb out if the cumulated length of the directory names are too long. That line creates the target directories as necessary.
  • ~/mail/**/*(/) expands to all the directories under ~/mail (directories only due to the (/) at the end).
  • (/e\''…'\') selects only directories and further executes the code within '…' to transform each file name, which is stored in the REPLY variable.
  • ${${REPLY#$HOME/}//:/_} removes the prefix corresponding with the source directory and changes : into _.
  • zmv -C copies each file matching its first operand (a zsh pattern) to the file name obtained by expandingg its second operand.
  • -o -pu says to pass -pu to the cp utility, so as to preserve permissions and copy only updated files. (We could tell zsh to perform the update check; it would be a little faster but even more cryptic.)
  • (.) selects only regular files. -Q says that this is to be parsed as a glob qualifier and not as a . with parentheses around it indicating a subexpression.
  • $1 and $2 in the replacement text match the parenthesized expressions (**/) and *. (** loses its special meaning as zero or more subdirectory levels if it's in parentheses, unless the parentheses contain exactly **/.)

I initially thought to use pax, which is an archiving tool (here intended to be used in pass-through mode) that has a file renaming feature (its -s option). However, the -s and -u options do not work together (the POSIX definition of pax literally says that -u must check a file of the same name in the destination tree, rather than the file name transformed by -s; the pax implementation in Ubuntu follows the spec literally rather than usefully). It's still possible to make use of it to make renamed hard links, and then copy the hard links (with rsync -au or pax -rw -pp -u) to the other media, but it feels more trouble than it's worth.

cd ~/mail
mkdir -p /media/usb99/mail
pax -rw -l -pp -s '!:!_!g' . ../mail.colonless
rsync -au ../mail.colonless/ /media/usb99/mail/