Why is ls -R called "recursive" listing?
First off, let's define an arbitrary folder structure:
.
├── a1 [D]
│ ├── b1 [D]
│ │ ├── c1
│ │ ├── c2 [D]
│ │ │ ├── d1
│ │ │ ├── d2
│ │ │ └── d3
│ │ ├── c3
│ │ ├── c4
│ │ └── c5
│ └── b2 [D]
│ ├── c6
│ └── c7
├── a2 [D]
│ ├── b3 [D]
│ │ ├── c8
│ │ └── c9
│ └── b4 [D]
│ ├── c10
│ └── c11
├── a3 [D]
│ ├── b5
│ ├── b6
│ └── b7
└── a4 [D]
When we do ls
, we get the output of the base folder only:
a1 a2 a3 a4
However, when we call ls -R
, we get something different:
.:
a1 a2 a3 a4
./a1:
b1 b2
./a1/b1:
c1 c2 c3 c4 c5
./a1/b1/c2:
d1 d2 d3
./a1/b2:
c6 c7
./a2:
b3 b4
./a2/b3:
c8 c9
./a2/b4:
c10 c11
./a3:
b5 b6 b7
./a4:
As you can see, it's running ls
on the base folder, and then all the child folders. And all the grandchild folders, ad infinitum. Effectively, the command is going through each folder recursively until it hits the end of the directory tree. At that point, it comes back up a branch in the tree and does the same thing for any sub-folders, if any.
Or, in pseudo-code:
recursivelyList(directory) {
files[] = listDirectory(directory) // Get all files in the directory
print(directory.name + ":\n" + files.join(" ")) // Print the "ls" output
for (file : files) { // Go through all the files in the directory
if (file.isDirectory()) { // Check if the current file being looked at is a directory
recursivelyList(directory) // If so, recursively list that directory
}
}
}
And because I can, a reference Java implementation of the same.
There are, in effect, two closely coupled questions you may be asking.
- Why is the process of walking to each entry in a filesystem hierarchy an inherently recursive process? This is addressed by the other answers, such as Zanna's and Kaz Wolfe's.
- How is the technique of recursion used in the implementation of
ls
? From your phrasing ("How is recursion used in the process?"), I think this is part of what you want to know. This answer addresses that question.
Why it makes sense for ls
to be implemented with a recursive technique:
FOLDOC defines recursion as:
When a function (or procedure) calls itself. Such a function is called "recursive". If the call is via one or more other functions then this group of functions are called "mutually recursive".
The natural way to implement ls
is to write a function that constructs a list of filesystem entries to be displayed, and other code to process path and option arguments and to display the entries as desired. That function is highly likely to be implemented recursively.
During option processing, ls
will determine if it has been asked to operate recursively (by being invoked with the -R
flag). If so, the function that builds a list of entries to be displayed will call itself once for each directory it lists, except .
and ..
. There may be separate recursive and nonrecursive versions of this function, or the function may check each time if it is supposed to be operating recursively.
Ubuntu's /bin/ls
, the executable that runs when you run ls
, is provided by GNU Coreutils, and it has many features. As a result, its code is somewhat longer and more complicated than you might expect. But Ubuntu also contains a simpler version of ls
, provided by BusyBox. You can run this by typing busybox ls
.
How busybox ls
uses recursion:
ls
in BusyBox is implemented in coreutils/ls.c
. It contains a scan_and_display_dirs_recur()
function that is invoked to print a directory tree recursively:
static void scan_and_display_dirs_recur(struct dnode **dn, int first)
{
unsigned nfiles;
struct dnode **subdnp;
for (; *dn; dn++) {
if (G.all_fmt & (DISP_DIRNAME | DISP_RECURSIVE)) {
if (!first)
bb_putchar('\n');
first = 0;
printf("%s:\n", (*dn)->fullname);
}
subdnp = scan_one_dir((*dn)->fullname, &nfiles);
#if ENABLE_DESKTOP
if ((G.all_fmt & STYLE_MASK) == STYLE_LONG || (G.all_fmt & LIST_BLOCKS))
printf("total %"OFF_FMT"u\n", calculate_blocks(subdnp));
#endif
if (nfiles > 0) {
/* list all files at this level */
sort_and_display_files(subdnp, nfiles);
if (ENABLE_FEATURE_LS_RECURSIVE
&& (G.all_fmt & DISP_RECURSIVE)
) {
struct dnode **dnd;
unsigned dndirs;
/* recursive - list the sub-dirs */
dnd = splitdnarray(subdnp, SPLIT_SUBDIR);
dndirs = count_dirs(subdnp, SPLIT_SUBDIR);
if (dndirs > 0) {
dnsort(dnd, dndirs);
scan_and_display_dirs_recur(dnd, 0);
/* free the array of dnode pointers to the dirs */
free(dnd);
}
}
/* free the dnodes and the fullname mem */
dfree(subdnp);
}
}
}
The line where the recursive function call takes place is:
scan_and_display_dirs_recur(dnd, 0);
Seeing the recursive function calls as they happen:
You can see this in operation if you run busybox ls
in a debugger. First install the debug symbols by enabling -dbgsym.ddeb packages and then installing the busybox-static-dbgsym
package. Install gdb
as well (that's the debugger).
sudo apt-get update
sudo apt-get install gdb busybox-static-dbgsym
I suggest debugging coreutils ls
on a simple directory tree.
If you don't have one handy, make one (this works the same way as the mkdir -p
command in WinEunuuchs2Unix's answer):
mkdir -pv foo/{bar/foobar,baz/quux}
And populate it with some files:
(shopt -s globstar; for d in foo/**; do touch "$d/file$((i++))"; done)
You can verify busybox ls -R foo
works as expected, producing this output:
foo:
bar baz file0
foo/bar:
file1 foobar
foo/bar/foobar:
file2
foo/baz:
file3 quux
foo/baz/quux:
file4
Open busybox
in the debugger:
gdb busybox
GDB will print some information about itself. Then it should say something like:
Reading symbols from busybox...Reading symbols from /usr/lib/debug/.build-id/5c/e436575b628a8f54c2a346cc6e758d494c33fe.debug...done.
done.
(gdb)
(gdb)
is your prompt in the debugger. The first thing you'll tell GDB to do on this prompt is to set a breakpoint at the start of the scan_and_display_dirs_recur()
function:
b scan_and_display_dirs_recur
When you run that, GDB should tell you something like:
Breakpoint 1 at 0x5545b4: file coreutils/ls.c, line 1026.
Now tell GDB to run busybox
with ls -R foo
(or whatever directory name you want) as its arguments:
run ls -R foo
You may see something like this:
Starting program: /bin/busybox ls -R foo
Breakpoint 1, scan_and_display_dirs_recur (dn=dn@entry=0x7e6c60, first=1) at coreutils/ls.c:1026
1026 coreutils/ls.c: No such file or directory.
If you do see No such file or directory
, as above, that's okay. The purpose of this demonstration is just to see when the scan_and_display_dirs_recur()
function has been called, so GDB doesn't need to examine the actual source code.
Notice that the debugger hit the breakpoint even before any directory entries were printed. It breaks on the entrace to that function, but the code in that function must run for any directories to be enumerated for printing.
To tell GDB to continue, run:
c
Each time scan_and_display_dirs_recur()
is called, the breakpoint will be hit again, so you will get to see recursion in action. It looks like this (including the (gdb)
prompt and your commands):
(gdb) c
Continuing.
foo:
bar baz file0
Breakpoint 1, scan_and_display_dirs_recur (dn=dn@entry=0x7e6cb0, first=first@entry=0) at coreutils/ls.c:1026
1026 in coreutils/ls.c
(gdb) c
Continuing.
foo/bar:
file1 foobar
Breakpoint 1, scan_and_display_dirs_recur (dn=dn@entry=0x7e6cf0, first=first@entry=0) at coreutils/ls.c:1026
1026 in coreutils/ls.c
(gdb) c
Continuing.
foo/bar/foobar:
file2
foo/baz:
file3 quux
Breakpoint 1, scan_and_display_dirs_recur (dn=dn@entry=0x7e6cf0, first=first@entry=0) at coreutils/ls.c:1026
1026 in coreutils/ls.c
(gdb) c
Continuing.
foo/baz/quux:
file4
[Inferior 1 (process 2321) exited normally]
The function has recur
in its name... does BusyBox only use it when the -R
flag is given? In the debugger, this is easy to find out:
(gdb) run ls foo
Starting program: /bin/busybox ls foo
Breakpoint 1, scan_and_display_dirs_recur (dn=dn@entry=0x7e6c60, first=1) at coreutils/ls.c:1026
1026 in coreutils/ls.c
(gdb) c
Continuing.
bar baz file0
[Inferior 1 (process 2327) exited normally]
Even without -R
, this particular implementation of ls
uses the same function to find out what filesystem entries exist and show them.
When you want to quit the debugger, just tell it:
q
How scan_and_display_dirs_recur()
knows if it should call itself:
Specifically how does it work differently when the -R
flag is passed? Examining the source code (that may not be the exact version on your Ubuntu system) reveals that it checks its internal data structure G.all_fmt
, where it stores what options it has been invoked with:
if (ENABLE_FEATURE_LS_RECURSIVE
&& (G.all_fmt & DISP_RECURSIVE)
(If BusyBox has been compiled without support for -R
, then it will also not attempt to display filesystem entries recursively; that's what the ENABLE_FEATURE_LS_RECURSIVE
part is about.)
Only when G.all_fmt & DISP_RECURSIVE
is true does the code that contains the recursive function call get run.
struct dnode **dnd;
unsigned dndirs;
/* recursive - list the sub-dirs */
dnd = splitdnarray(subdnp, SPLIT_SUBDIR);
dndirs = count_dirs(subdnp, SPLIT_SUBDIR);
if (dndirs > 0) {
dnsort(dnd, dndirs);
scan_and_display_dirs_recur(dnd, 0);
/* free the array of dnode pointers to the dirs */
free(dnd);
}
Otherwise, the function just runs once (per directory specified on the command line).
When you think about it, "recursive" makes sense for commands that act on directories and their files and directories and their files and directories and their files and directories and their files.........
.... until the entire tree from the specified point down has been operated on by the command, in this case listing the contents of any subdirectories of any subdirectories of any subdirectories.......... that exist under the argument(s) of the command
-R is for recursion, which could loosely be called "repeatedly".
Take this code for example:
───────────────────────────────────────────────────────────────────────────────
$ mkdir -p temp/a
───────────────────────────────────────────────────────────────────────────────
$ mkdir -p temp/b/1
───────────────────────────────────────────────────────────────────────────────
$ mkdir -p temp/c/1/2
───────────────────────────────────────────────────────────────────────────────
$ ls -R temp
temp:
a b c
temp/a:
temp/b:
1
temp/b/1:
temp/c:
1
temp/c/1:
2
temp/c/1/2:
The -p
in making directories allows you to mass create directories with a single command. If one or more of the top-middle directories already exist it's not an error and the middle-lower directories are created.
Then the ls -R
recursively lists every single directory starting with temp and working it's way down the tree to all the branches.
Now let's look at a complement to the ls -R
command, ie the tree
command:
$ tree temp
temp
├── a
├── b
│ └── 1
└── c
└── 1
└── 2
6 directories, 0 files
As you can see tree
accomplishes the same as ls -R
except is more concise and dare I say "prettier".
Now let's look at how to recursively remove the directories we just created in one simple command:
$ rm -r temp
This recursively removes temp
and all the sub-directories underneath it. ie temp/a
, temp/b/1
and temp/c/1/2
plus the middle directories in between.
Here's a simple explanation, it makes sense because when it comes to displaying the content of subdirectories, the same function already knows what to do with a directory. Therefore it just needs to call itself on each subdirectory to get that result!
In pseudocode it looks something like this:
recursive_ls(dir)
print(files and directories)
foreach (directoriy in dir)
recursive_ls(directory)
end foreach
end recursive_ls