How does the #! shebang work?
In a script you must include a #!
on the first line followed by the path to the program that will execute the script (e.g.: sh, perl).
As far as I know, the #
character denotes the start of a comment and that line is supposed to be ignored by the program executing the script. It would seem, that this first line is at some point read by something in order for the script to be executed by the proper program.
Could somebody please shed more light on the workings of the #!
?
I'm really curious about this, so the more in-depth the answer the better.
Solution 1:
Recommended reading:
- The UNIX FAQ: Why do some scripts start with #! ... ?
- The #! magic, details about the shebang/hash-bang mechanism on various Unix flavours
- Wikipedia: Shebang
The unix kernel's program loader is responsible for doing this. When exec()
is called, it asks the kernel to load the program from the file at its argument. It will then check the first 16 bits of the file to see what executable format it has. If it finds that these bits are #!
it will use the rest of the first line of the file to find which program it should launch, and it provides the name of the file it was trying to launch (the script) as the last argument to the interpreter program.
The interpreter then runs as normal, and treats the #!
as a comment line.
Solution 2:
The Linux kernel exec
system call uses the initial bytes #!
to identify file type
When you do on bash:
./something
on Linux, this calls the exec
system call with the path ./something
.
This line gets called in the kernel on the file passed to exec
: https://github.com/torvalds/linux/blob/v4.8/fs/binfmt_script.c#L25
if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!'))
It reads the very first bytes of the file, and compares them to #!
.
If the comparison is true, then the rest of the line is parsed by the Linux kernel, which makes another exec
call with path /usr/bin/env python
and current file as the first argument:
/usr/bin/env python /path/to/script.py
and this works for any scripting language that uses #
as a comment character.
And yes, you can make an infinite loop with:
printf '#!/a\n' | sudo tee /a
sudo chmod +x /a
/a
Bash recognizes the error:
-bash: /a: /a: bad interpreter: Too many levels of symbolic links
#!
is human readable, but that is not necessary.
If the file started with different bytes, then the exec
system call would use a different handler. The other most important built-in handler is for ELF executable files: https://github.com/torvalds/linux/blob/v4.8/fs/binfmt_elf.c#L1305 which checks for bytes 7f 45 4c 46
(which also happens to be human readable for .ELF
). Let's confirm that by reading the 4 first bytes of /bin/ls
, which is an ELF executable:
head -c 4 "$(which ls)" | hd
output:
00000000 7f 45 4c 46 |.ELF|
00000004
So when the kernel sees those bytes, it takes the ELF file, puts it into memory correctly, and starts a new process with it. See also: How does kernel get an executable binary file running under linux?
Finally, you can add your own shebang handlers with the binfmt_misc
mechanism. For example, you can add a custom handler for .jar
files. This mechanism even supports handlers by file extension. Another application is to transparently run executables of a different architecture with QEMU.
I don't think POSIX specifies shebangs however: https://unix.stackexchange.com/a/346214/32558 , although it does mention in on rationale sections, and in the form "if executable scripts are supported by the system something may happen". macOS and FreeBSD also seem to implement it however.