Why is -r recursive necessary when copying a directory in Linux?

linux cp

My question is why is it required to use the -r (recursive) flag when making a copy of a directory? I.e., why do this:

$ cp -r dir1 copyDir1

When would I not want this behavior when copying a directory?

Isn’t a recursive copy of a directory really the “default” behavior; the behavior we want nearly all the time?

It feels like this is a superfluous flag.

Solution 1:

The way filesystems work, a directory is not actually a folder containing files but rather a directory is a file that contains inode pointers to “child” files connected to it. Meaning, from a file system perspective, a file is a file, but a directory is just a file containing list of connected files.

So from the command line perspective, doing this:

$ cp dir1 copyDir1

Would basically mean copy the file named, dir1 to a new file named copyDir1. And as far as the file system is concerned, dir1 is just a file anyway; the fact it’s a “directory” will only be apparent when the filesystem actually checks dir1 to see what that pile of bits actually is.

The -r flag tells the file system to recursively roll down the file/directory tree and copy any & all contents that might be a “child” of that file to a new place.

Now as to why that might seem superfluous or redundant, this really comes down to historic methods of dealing with file systems. As well as creating a system that is safe from all types of user related errors; accidental as well as intentional.

Meaning, let’s say you have a ~/bin file in your home directory you want to copy but accidentally left out the ~—because you are a human and make mistakes—so its just /bin like this:

cp /bin/ ~/copy_of_bin

With the “safety net” of /bin being a directory combined with the need for the -r flag you will avoid accidentally copying the whole binary root of the system you are on into your home directory. If that safety net did not exist, a minor—or possibly major—disaster would happen.

The logic here being that in the days pre-GUI (graphical user interfaces) logical/behavioral conventions need to be set to avoid having user created mishaps that can potentially kill a system. And using the -r flag is now one of them.

If that seems superfluous, then need look no further than modern GUI system one can place above Linux file systems. A GUI addresses basic user issues like this by allowing one to drag and drop files and directories with ease.

But in the case of the realm of text-based interfaces, lots of the “user experience” within that world is basically just logical and hueristic-based road bumps that help keep the user in check so potential disaster can be averted.

Similarly this is why Linux/Unix filesystems don’t have 777 permissions and sudo rights set by default and how real system administrators wince when a user sets 777 permissions or grants everyone sudo rights. These are the basic things one does to ensure the system is stable and as “user proof” as possible; anyone rushing to short-circuit those conventions will most likely cause damage to their system without even knowing it.

ADDITIONAL INFO: Another answer here on the Unix Stack Exchange site gives a good explanation of why a non-recursive copy of a directory is problematic; emphasis is mine.

Well, without the -R flag, it's only possible to copy files, because it's rather unusual that someone wants to non-recursively copy a directory: A non-recursive copy would just result in a second name for the directory, pointing to directly the same directory structure. Because that's rarely what people want, and there is actually a separate program that does this (ln), a non-recursive copy of directories is not allowed.

So if a directory is just really a file with inode items inside of it, making a straight copy of that file would just be the equivalent of how a hard link would work. Which is not what anyone wants.

Solution 2:

It's very true that this is the behavior we want nearly all the time. This doesn't necessarily mean, though, that copying recursively should be the default behaviour.

I think the reasons cp acts as it does have roots in Unix philosophy. Unix favors programs that do one thing and do it well, as well as programs that are simple in both interface and implementation (sometimes called worse is better).

The key piece of the puzzle here is realizing that cp does not copy directories - cp copies files (and only files). If you want to copy a directory, cp calls itself recursively, in order to copy the files on each directory.

Of course, the difference between "copying directories" and "copying files recursively" is absolutely nothing, from a user's perspective, but having this interface helps the implementation remain simple.

If you made cp able to copy directories, you'd soon be tempted to add more features that only make sense for directories - for example, you might want to only copy filenames that ended in .sh. Inevitably, this leads to the bloat and feature creep that we are used to in other operating systems - making software slow, complex and error-prone.

Another advantage is that having -r also helps the user understand what is really happening underneath the interface. A nice side effect of this is that learning the concept of recursive operation will spare you some work when you learn about other tools that support it (such as grep, for example)

Some people will certainly tell you that exposing implementation details to the user is bad, and that having more features is good. My intent here is merely explaining the rationale of this behaviour, so I won't try to argue either way.

Solution 3:

Interactions with directories makes sure you know you're interacting with a directory and NOT just a single file.

For instance:

$ tree
.
└── folder1
    └── sub1
        └── subsub1

3 directories, 0 files
$
$ cp folder1/ folder2
cp: folder1/ is a directory (not copied).
$
$ mkdir blah
$ cp blah/ blah2
cp: blah/ is a directory (not copied).
$ rm blah/
rm: blah/: is a directory

So, if you want to successful copy a folder, since it implies both the folder and the objects related to referencing the folder, you have to treat it as if its a collection of files:

$ cp -r folder1/ folder2
$ rm -rf folder1

Solution 4:

The consequence of changing the default would be that thousands of shell scripts would break. This leads to the POSIX and SUS requirements for the well known default behavior.

The reason is the historical development of cp, ln and mv commands (all the same binary on most old UNIX systems) in various UNIX branches. When -r appeared (early cp did not have an option to copy directories; here is an early cp man page without -r or -R), there were various differences in handling special files, symlinks and and other vagaries of the filesystem.

From The Open Group Base Specifications Issue 7:

Earlier versions of this standard included support for the -r option to copy file hierarchies. The -r option is historical practice on BSD and BSD-derived systems. This option is no longer specified by POSIX.1-2008 but may be present in some implementations. The -R option was added as a close synonym to the -r option, selected for consistency with all other options in this volume of POSIX.1-2008 that do recursive directory descent.

The difference between -R and the removed -r option is in the treatment by cp of file types other than regular and directory. It was implementation-defined how the - option treated special files to allow both historical implementations and those that chose to support -r with the same abilities as -R defined by this volume of POSIX.1-2008. The original -r flag, for historic reasons, did not handle special files any differently from regular files, but always read the file and copied its contents. This had obvious problems in the presence of special file types; for example, character devices, FIFOs, and sockets.

In fact you will still see some people regularly using:

cd dir1 ; tar -cf - . | (cd dir2 ; tar -xpf -)

Because they don't trust that the cp -r implementation with be what they are used to on an arbitrary machine; Or because they want the tar behavior.

Why is -r recursive necessary when copying a directory in Linux?

Solution 1:

Solution 2:

Solution 3:

Solution 4:

Related

Recent Posts