Is it safe to use a HDD while rsync is running?

Solution 1:

As others have already pointed out, it is safe to read from the source disk, or use the target disk outside out the target directory, while rsync is running. It is also safe to read within the target directory, especially if the target directory is being populated exclusively by the rsync run.

What's not generally safe is to write within the source directory while rsync is running. "Writes" is anything that modifies the content of the source directory or any subdirectory thereof, so includes file updates, deletes, creation, etc.

Doing so won't actually break anything, but the change may or may not actually get picked up by rsync for copying to the target location. That depends on the type of change, whether rsync has scanned that particular directory yet, and whether rsync has copied the file or directory in question yet.

However, there is an easy way around that: Once it finishes, run rsync again, with the same parameters. (Unless you have some funky delete parameter; if you do, then be a bit more careful.) Doing so will cause it to re-scan the source, and transfer any differences that weren't picked up during the original run.

The second run should transfer only differences that happened during the previous rsync run, and as such will complete much faster. Thus, you can feel free to use the computer normally during the first run, but should avoid as much as possible making any changes to the source during the second run. If you can, strongly consider remounting the source file system read-only before starting the second rsync run. (Something like mount -o ro,remount /media/source should do.)

Solution 2:

This depends of the backup system you use, but in general it is a bad idea to modify the contents of a device while you're backing it up. However, you can read its contents; that's a safe operation, even if it will slow down the process.

In your case, rsync will build up a file list and then start the backup. Therefore any file you add to the source HDD after the backup has started will not be copied.

What I do is not to use a device at all during a backup. This is the safer way to obtain a fast and consistent backup.

Solution 3:

It is safe to read data from the source areas while rsync is operating, but if you update anything the copy that rsync creates/updates is likely to be inconsistent:

  1. If you update a file that rsync has already scanned then it will not see the update until a future run. If you update a file it has yet to scan the change will be respected in the destination. If you update files that both have and have not been scanned you will end up with a mix of old and new versions in the destination.

  2. If you add a file to a directory that has already been scanned it will be missed from the destination copy this time around. If you remove a file from a directory that has already been scanned it will be left in the destination copy this time. Depending on how you invoke rsync the whole tree may be scanned at the start or it may be incrementally scanned as the sync process happens.

  3. In some circumstances rsync will see the inconsistency and warn you. If you remove a file or sub-directory from a directory that has already been scanned itself but has not had its contents scanned you will get an error message about the object being missing. In similar circumstances it can sometimes (if the size and/or timestamp has changed) also warn about files changing mid-scan.

For some backups this inconsistency may not be a massive issue, but for most it will be so it is recommended that you don't try sync an actively changing source.

If you use LVM to portion your storage system you could use a temporary snapshot to take a point-in-time backup. This requires that you have enough space on the volume group to create a snapshot volume large enough to hold all the changes that will happen in the duration that the snapshot is needed. Check the LVM documentation (or one of many online examples: search for "LVM snapshot backup" or similar) for more details.

Even without LVM some filesystems support snapshots themselves - so you may wish to look into that option too.

If you want to backup large active volumes without long downtime and can't use snapshots, it may be sufficient to run the "live" scan to completion then stop access to the volume and run another rsync process which may take far less time (if very little has changed it will just scan the directory tree then the few updated files). This way the duration in which you should avoid changes could be much shorter.

Solution 4:

  • Source HDD can read anything while rsync.

  • Source HDD can write any content not related to the rsync content.

  • Destination HDD can read anything while rsync.

  • Destination HDD can write anything while rsync with the condition to have sufficient space reserved for the sync'ed content.

Of course, in any of the cases, there will be performance reduction.