Why does moving some files in a folder take longer than moving the whole folder?

Solution 1:

TL;DR: No

For a smaller amount of files, you would not need find but, even in this simplified and smaller case, if you just

mv *.jpg ../../dst/

it will take more time than moving the whole directory at once.


Why? The point is to understand what mv does.

Briefly speaking, mv moves a number (that identifies a directory, or a file) from an inode (the directory containing it) to another one, and these indices are updated in the journal of the file system or in the FAT (if the file system is implemented in such a way).

If source and destination are on the same file system, there is no actual movement of data, it just changes the position, the point where they are attached to.

So, when you mv one directory, you are doing this operation one time.

But when you move 1 million files, you are doing this operation 1 million times.

To give you a practical example, you have a tree with a many branches. In particular, there is one node to which 1 million branches are attached.
To cut down these branches and move them somewhere else, you can either cut each one of them, so you make 1 million cuts, or you cut just before the node, thus making just one cut (this is the difference between moving the files and the directory).

Solution 2:

It will still be slow because, as noted, the file system has to relink each file name to its new location.

However, you can speed it up from what you have now.

Your find command runs the exec once for each file. So it launches the mv command 12 million times for 12 million files. This can be improved in two ways.

  • Add a plus to the end:
    find -maxdepth 1 -name '*.jpg' -exec mv -t ../../dst/ +
    Check the man-page to make sure it's supported in your version of find. The effect should be to run a series of mv commands with as many filenames as will fit on each command-line.

  • Use find and xargs together.
    find -maxdepth 1 -name '*.jpg' -print0 | xargs -0 mv -t ../../dst/
    The -print0 will use NUL, aka zero bytes to separate the file names. This plus xargs -0 fixes any problems xargs would otherwise have with spaces in file names. The xargs command will read the list of file names from the find command and run the mv command on as many file names as will fit.