What's the best way to merge two directories on the same filesystem in linux?

I have two directories that needs to be merged together. Files in these two directories are all large files (>= 500MB).

What I want to archive: For files in source directory: if it doesn't exist in destination directory, mv it to the destination directory (which is fast since we are basically creating a new hard link and unlink the source file); if it exist in destination directory, copy the source file there and remove source file.

The most common way to merge directories in Linux system is to use rsync with --remove-source-files option. But this is slow because it will do copy operation even the destination file doesn't exist.

Any better ideas? Thank you.


Solution 1:

Basically what You described is move files an overwrite destination if exists. So Just move them.

Solution 2:

There's a case where mv fails. Here's some example data:

mkdir -p src/d dest/d
touch src/d/f1 dest/d/f2

See how mv fails:

$ mv src/* dest/
mv: cannot move 'src/d' to 'dest/d': Directory not empty
$ mv -f src/* dest/
mv: cannot move 'src/d' to 'dest/d': Directory not empty
$ mv -fv src/* dest/
mv: cannot move 'src/d' to 'dest/d': Directory not empty
$ mv -fvi src/* dest/
mv: overwrite 'dest/d'? y
mv: cannot move 'src/d' to 'dest/d': Directory not empty
$ mv -fvi -t dest/ src/*      
mv: overwrite 'dest/d'? y
mv: cannot move 'src/d' to 'dest/d': Directory not empty

So make a script file:

vim supermove

This example does no error checking (DISCLAIMER: works for me, but please test that it works for you... maybe with echo before mv), and will overwrite files with same path. And it uses find with \; which is terribly inefficient, but + doesn't work right with "$dest" prepended. Older versions will make some dirs without the path prepended, and newer versions of find will say:

find: In '-exec ... {} +' the '{}' must appear by itself, but you specified 'dest/{}'

You could probably find a way to fix that with xargs though. (It took a few minutes on the 64k files 8TB that I was moving). Add this content:

#!/bin/bash

src=$1
dest=$2

src=$(readlink -f "$src")
dest=$(readlink -f "$dest")

cd "$src"

# also copy hidden files
shopt -s dotglob

# make dirs (missing old permission,acl,xattr data), and then mv the files
time find * -type d -exec mkdir -p "$dest"/{} \;
time find * -type f -exec mv {} "$dest"/{} \;

# also copy permissions, acls, xattrs
rsync -aAX "$src"/ "$dest"/

And make it executable:

chmod +rx supermove

And run it

./supermove src/ dest/

And the result... before:

$ find src dest
src/
src/d
src/d/f1
dest/
dest/d
dest/d/f2

After:

$ find src dest
src
src/d
dest
dest/d
dest/d/f1
dest/d/f2

Now src/ should be just empty dirs. If so, you can rm -r src to clean up.

Solution 3:

mv options are all about conflict resolution:

Pick one:

-f  force (always overwrite)
-i  interactive (ask whether to overwrite)
-n  no clobber (no overwrite)

And this is good too:

-v  verbose

Otherwise, data can get lost and/or it won't be clear what exactly happened.

mv is also superior on the the same fs because it's just updating directory inodes, the files shouldn't messed with. The other thing is that the larger the operation, there is a greater chance for things to go wrong like soft-errors.