What's the difference between shell builtin and shell keyword?

Solution 1:

There's a strong difference between a builtin and a keyword, in the way Bash parses your code. Before we talk about the difference, let's list all keywords and builtins:

Builtins:

$ compgen -b
.         :         [         alias     bg        bind      break     
builtin   caller    cd        command   compgen   complete  compopt   
continue  declare   dirs      disown    echo      enable    eval      
exec      exit      export    false     fc        fg        getopts   
hash      help      history   jobs      kill      let       local     
logout    mapfile   popd      printf    pushd     pwd       read      
readarray readonly  return    set       shift     shopt     source    
suspend   test      times     trap      true      type      typeset   
ulimit    umask     unalias   unset     wait                          

Keywords:

$ compgen -k
if        then      else      elif      fi        case      
esac      for       select    while     until     do        
done      in        function  time      {         }         
!         [[        ]]        coproc              

Notice that, for example [ is a builtin and that [[ is a keyword. I'll use these two to illustrate the difference below, since they are well-known operators: everybody knows them and uses them regularly (or should).

A keyword is scanned and understood by Bash very early in its parsing. This allows for example the following:

string_with_spaces='some spaces here'
if [[ -n $string_with_spaces ]]; then
    echo "The string is non-empty"
fi

This works fine, and Bash will happily output

The string is non-empty

Note that I didn't quote $string_with_spaces. Whereas the following:

string_with_spaces='some spaces here'
if [ -n $string_with_spaces ]; then
    echo "The string is non-empty"
fi

shows that Bash isn't happy:

bash: [: too many arguments

Why does it work with keywords and not with builtins? because when Bash parses the code, it sees [[ which is a keyword, and understands very early that it's special. So it will look for the closing ]] and will treat the inside in a special way. A builtin (or command) is treated as an actual command that is going to be called with arguments. In this last example, bash understands that it should run the command [ with arguments (shown one per line):

-n
some
spaces
here
]

since variable expansion, quote removal, pathname expansion and word splitting occurs. The command [ turns out to be built in the shell, so it executes it with these arguments, which results in an error, hence the complaint.

In practice, you see that this distinction allows for sophisticated behavior, that wouldn't be possible with builtins (or commands).

Still in practice, how can you possibly distinguish a builtin from a keyword? this is a fun experiment to perform:

$ a='['
$ $a -d . ]
$ echo $?
0

When Bash parses the line $a -d . ], it sees nothing special (i.e., no aliases, no redirections, no keywords), so it just performs variable expansion. After variable expansions, it sees:

[ -d . ]

so executes the command (builtin) [ with arguments -d, . and ], which, of course is true (this only tests whether . is a directory).

Now look:

$ a='[['
$ $a -d . ]]
bash: [[: command not found

Oh. That's because when Bash sees this line, it sees nothing special, and hence expands all variables, and eventually sees:

[[ -d . ]]

At this time, alias expansions and keyword scanning has long been performed and is not going to be performed anymore, so Bash tries to find the command called [[, doesn't find it, and complains.

Along the same lines:

$ '[' -d . ]
$ echo $?
0
$ '[[' -d . ]]
bash: [[: command not found

and

$ \[ -d . ]
$ echo $?
0
$ \[[ -d . ]]
bash: [[: command not found

Alias expansion is something rather special too. You've all done the following at least once:

$ alias ll='ls -l'
$ ll
.... <list of files in long format> ....
$ \ll
bash: ll: command not found
$ 'll'
bash: ll: command not found

The reasoning is the same: alias expansion occurs long before variable expansion and quote removal.


Keyword v.s. Alias

Now what do you think happens if we define an alias to be a keyword?

$ alias mytest='[['
$ mytest -d . ]]
$ echo $?
0

Oh, it works! so aliases can be used to alias keywords! nice to know.


Conclusion: builtins really behave like commands: they correspond to an action being executed with arguments that undergo direct variable expansion and word splitting and globbing. It's really just like having an external command somewhere in /bin or /usr/bin that is called with the arguments given after variable expansion, etc. Note that when I say it's really just like having an external command I only mean with respect to arguments, word splitting, globbing, variable expansion, etc. A builtin can modify the shell's internal state!

Keywords, on the other hand, are scanned and understood very early, and allow for sophisticated shell behavior: the shell will be able to forbid word splitting or pathname expansion, etc.

Now look at the list of builtins and keywords and try to figure out why some need to be keywords.


! is a keyword. It seems it would be possible to mimic its behavior with a function:

not() {
    if "$@"; then
        return 1
    else
        return 0
    fi
}

but this would forbid constructs like:

$ ! ! true
$ echo $?
0

(in that case, I mean not ! true which doesn't work) or

$ ! { true; }
echo $?
1

Same for time: it's more powerful to have it a keyword so that it can time complex compound commands and pipelines with redirections:

$ time grep '^#' ~/.bashrc | { i=0; while read -r; do printf '%4d %s\n' "$((++i))" "$REPLY"; done; } > bashrc_numbered 2>/dev/null

If time where a mere command (even builtin), it would only see the arguments grep, ^# and /home/gniourf/.bashrc, time this, and then its output would go through the remaining parts of the pipeline. But with a keyword, Bash can handle everything! it can time the complete pipeline, including the redirections! If time were a mere command, we couldn't do:

$ time { printf 'hello '; echo world; }

Try it:

$ \time { printf 'hello '; echo world; }
bash: syntax error near unexpected token `}'

Try to fix (?) it:

$ \time { printf 'hello '; echo world;
time: cannot run {: No such file or directory

Hopeless.


Keyword vs alias?

$ alias mytime=time
$ alias myls=ls
$ mytime myls

What do you think happens?


Really, a builtin is like a command, except that it's built in the shell, whereas a keyword is something that allows for sophisticated behavior! we can say it's part of the shell's grammar.

Solution 2:

man bash calls them SHELL BUILTIN COMMANDS. So, a "shell builtin" is just like a normal command, like grep, etc., but instead of being contained in a separate file, it's built into bash itself. This makes them perform more efficiently than external commands.

A keyword is also "hard-coded into Bash, but unlike a builtin, a keyword is not in itself a command, but a subunit of a command construct." I interpret this to mean that keywords have no function alone, but require commands to do anything. (From the link, other examples are for, while, do, and !, and there are more in my answer to your other question.)

Solution 3:

The command-line manual that comes with Ubuntu doesn't give a definition of keywords, however the online manual (see sidenote) and POSIX Shell Command Language standard specifications, refer to these as "Reserved Words", and both provide lists of those. From the POSIX standard:

This recognition shall only occur when none of the characters is quoted and when the word is used as:

  • The first word of a command

  • The first word following one of the reserved words other than case, for, or in

  • The third word in a case command (only in is valid in this case)

  • The third word in a for command (only in and do are valid in this case)

The key here is that keywords/reserved words have special meaning because they facilitate the shell syntax, serve to signal certain blocks of code such as loops, compound commands, branching ( if/case ) statements, etc. They allow forming command statements, but by themselves - don't do anything, and in fact if you enter keywords such as for, until, case - the shell will expect a complete statement, otherwise - syntax error:

$ for
bash: syntax error near unexpected token `newline'
$  

On source code level, the reserved words for bash are defined in parese.y, while built-ins have whole directory dedicated to them.

Sidenote

The GNU index shows [ as reserved word,however it is in-fact built-in command. [[ by contrast is a reserved word.

See also: Differences between keyword, reserved word, and builtin?