Why does 'ls > ls.out' cause 'ls.out' to be included in list of names?

Why does $ ls > ls.out cause 'ls.out' to be included in list of names of files in current directory? Why was this chosen to be? Why not otherwise?


Solution 1:

When evaluating the command the > redirection is resolved first: so by the time ls runs the output file has been created already.

This is also the reason why reading and writing to the same file using a > redirection within the same command truncates the file; by the time the command runs the file has been truncated already:

$ echo foo >bar
$ cat bar
foo
$ <bar cat >bar
$ cat bar
$ 

Tricks to avoid this:

  • <<<"$(ls)" > ls.out (works for any command that needs to run before the redirection is resolved)

    The command substitution is run before the outer command is evaluated, so ls is run before ls.out is created:

    $ ls
    bar  foo
    $ <<<"$(ls)" > ls.out
    $ cat ls.out 
    bar
    foo
    
  • ls | sponge ls.out (works for any command that needs to run before the redirection is resolved)

    sponge writes to the file only when the rest of the pipe has finished executing, so ls is run before ls.out is created (sponge is provided with the moreutils package):

    $ ls
    bar  foo
    $ ls | sponge ls.out
    $ cat ls.out 
    bar
    foo
    
  • ls * > ls.out (works for ls > ls.out's specific case)

    The filename expansion is performed before the redirection is resolved, so ls will run on its arguments, which won't contain ls.out:

    $ ls
    bar  foo
    $ ls * > ls.out
    $ cat ls.out 
    bar
    foo
    $
    

On why redirections are resolved before the program / script / whatever is run, I don't see a specific reason why it's mandatory to do so, but I see two reasons why it's better to do so:

  • not redirecting STDIN beforehand would make the program / script / whatever hold until STDIN is redirected;

  • not redirecting STDOUT beforehand should necessarily make the shell buffer the program's / script's / whatever's output until STDOUT is redirected;

So a waste of time in the first case and a waste of time and memory in the second case.

This is just what occurs to me, I'm not claiming these are the actual reasons; but I guess that all in all, if one had a choice, they would go with redirecting before anyway for the abovementioned reasons.

Solution 2:

From man bash:

REDIRECTION

Before a command is executed, its input and output may be redirected using a special notation interpreted by the shell. Redirection allows commands' file handles to be duplicated, opened, closed, made to refer to different files, and can change the files the command reads from and writes to.

First sentence, suggests that output is made to go somewhere other than stdin with redirection right before the command is executed. Thus, in order to be redirected to file, file must first be created by the shell itself.

To avoid having a file, I suggest you redirect output to named pipe first, and then to file. Note the use of & to return control over terminal to the user

DIR:/xieerqi
skolodya@ubuntu:$ mkfifo /tmp/namedPipe.fifo                                                                         

DIR:/xieerqi
skolodya@ubuntu:$ ls > /tmp/namedPipe.fifo &
[1] 14167

DIR:/xieerqi
skolodya@ubuntu:$ cat /tmp/namedPipe.fifo > ls.out

But why?

Think about this - where will be the output ? A program has functions like printf, sprintf , puts , which all by default go to stdout, but can their output be gone to file if file doesn't exist in the first place ? It's like water. Can you get a glass of water without putting glass underneath the faucet first ?

Solution 3:

I don't disagree with the current answers. The output file has to be opened before the command runs or the command won't have anywhere to write its output.

This is because "everything is a file" in our world. Output to screen is SDOUT (aka file descriptor 1). For an application to write to the terminal, it opens fd1 and writes to it like a file.

When you redirect an application's output in a shell, you're altering fd1 so it's actually pointing at the file. When you pipe you alter one application's STDOUT to become another's STDIN (fd0).


But it's all nice saying that, but you can quite easily look at how this works with strace. It's pretty heavy stuff but this example is quite short.

strace sh -c "ls > ls.out" 2> strace.out

Within strace.out we can see the following highlights:

open("ls.out", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3

This opens ls.out as fd3. Write only. Truncates (overwrites) if exists, otherwise creates.

fcntl(1, F_DUPFD, 10)                   = 10
close(1)                                = 0
fcntl(10, F_SETFD, FD_CLOEXEC)          = 0
dup2(3, 1)                              = 1
close(3)                                = 0

This is a bit of juggling. We shunt STDOUT (fd1) off to fd10 and close it off. This is because we're not outputting anything to the real STDOUT with this command. It finishes by duplicating the write handle to ls.out and closing the original one.

stat("/opt/wine-staging/bin/ls", 0x7ffc6bf028c0) = -1 ENOENT (No such file or directory)
stat("/home/oli/bin/ls", 0x7ffc6bf028c0) = -1 ENOENT (No such file or directory)
stat("/usr/local/sbin/ls", 0x7ffc6bf028c0) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/ls", 0x7ffc6bf028c0) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/ls", 0x7ffc6bf028c0)    = -1 ENOENT (No such file or directory)
stat("/usr/bin/ls", 0x7ffc6bf028c0)     = -1 ENOENT (No such file or directory)
stat("/sbin/ls", 0x7ffc6bf028c0)        = -1 ENOENT (No such file or directory)
stat("/bin/ls", {st_mode=S_IFREG|0755, st_size=110080, ...}) = 0

This is it searching for the executable. A lesson perhaps to not have a long path ;)

clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0961324a10) = 31933
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 31933
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31933, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn()                          = 31933
dup2(10, 1)                             = 1
close(10)                               = 0

Then the command runs and the parent waits. During this operation any STDOUT will have actually mapped to the open file handle on ls.out. When the child issues SIGCHLD, this tells the parent process its finished and that it can resume. It finishes off with a little more juggling and a close of ls.out.

Why is there so much juggling? No I'm not entirely sure either.


Of course you can change this behaviour. You could buffer to memory wth something like sponge and that'll be invisible from the proceeding command. We're still affecting the file descriptors, but not in a file-system-visible way.

ls | sponge ls.out

Solution 4:

There is also a nice article about Implementation of redirection and pipe operators in shell. Which shows how redirection could be implemented so $ ls > ls.out could look like:

main(){
    close(1); // Release fd no - 1
    open("ls.out", "w"); // Open a file with fd no = 1
    // Child process
    if (fork() == 0) {
        exec("ls"); 
    }
}