What is the fastest way to move a million images from one directory to another in Linux?
I have a million images that takes up 30GB of disk space that need to be moved from one local directory to another local directory.
What would be the most efficient way to do this? Using mv
? Using cp
? Using rsync
? Something else?
I need to take these:
/path/to/old-img-dir/*
00000000.jpg
--------.jpg ## nearly 1M of them! ##
ZZZZZZZZ.jpg
and move them here:
/path/to/new/img/dir/
rsync
would be a poor choice because it does a lot of client/server background work which accounts for local as well as remote systems.
mv
is probably the best choice. If possible, you should try mv directory_old directory_new
rather than mv directory_old/* directory_new/
. This way, you move one thing instead of a million things.
find src_image_dir/ -type f -name '*.jpg' -print0 | xargs -0r mv -t dst_image_dir/
- This will not overflow argument expansion.
- You can specify the file extension, if you want to. (-name ...)
-
find -print0
withxargs -0
allows you to use spaces in the names. -
xargs -r
will not runmv
unless there is something to be moved. (mv
will complain if no source files are given). - The syntax
mv -t
allows you to specify first the destination and then the source files, needed byxargs
. - Moving the whole directory is of course much faster, since it takes place in constant time regardless of the number of files contained in it, but:
- the source directory will disappear for a fraction of time and it might create you problems;
- if the process is using the current directory as output directory (in contrast to always referring to a full path from a non-moving location), you would have to relaunch it. (like you do with log rotation).
By the way, I would ask myself whether I really have to move such a big amount of files at once. Batch processing is overrated. I try not to accumulate huge amounts of work if I can process things at the moment they are generated.
If the two directories reside on the same filesystem, use mv
on the DIRECTORY and not the contents of the directory.
If they reside on two different filesystems, use rsync:
rsync -av /source/directory/ /destination
Notice the trailing /
on the source. This means it will copy the CONTENTS of the directory and not the directory itself. If you leave the /
off, it will still copy the files but they will sit in a directory named /destination/directory
. With the /, the files will just be in /destination
rsync
will maintain file ownership if you run it as root or if the files are owned by you. It will also maintain the mtime
of each individual file.
tar cf - dir1 | (cd dir2; tar xf -)
tar cf - dir1 | ssh remote_host "( cd /path/to/dir2; tar xf - )"
When you use 'cp' each file does a open-read-close-open-write-close. Tar uses different processes for reading and writing as well as multiple treads to operate on multiple files at once. Even on a single CPU box multithreaded apps are faster.