Is "argv[0] = name-of-executable" an accepted standard or just a common convention?
Guesswork (even educated guesswork) is fun but you really need to go to the standards documents to be sure. For example, ISO C11 states (my emphasis):
If the value of
argc
is greater than zero, the string pointed to byargv[0]
represents the program name;argv[0][0]
shall be the null character if the program name is not available from the host environment.
So no, it's only the program name if that name is available. And it "represents" the program name, not necessarily is the program name. The section before that states:
If the value of
argc
is greater than zero, the array membersargv[0]
throughargv[argc-1]
inclusive shall contain pointers to strings, which are given implementation-defined values by the host environment prior to program startup.
This is unchanged from C99, the previous standard, and means that even the values are not dictated by the standard - it's up to the implementation entirely.
This means that the program name can be empty if the host environment doesn't provide it, and anything else if the host environment does provide it, provided that "anything else" somehow represents the program name. In my more sadistic moments, I would consider translating it into Swahili, running it through a substitution cipher then storing it in reverse byte order :-).
However, implementation-defined does have a specific meaning in the ISO standards - the implementation must document how it works. So even UNIX, which can put anything it likes into argv[0]
with the exec
family of calls, has to (and does) document it.
Under *nix
type systems with exec*()
calls, argv[0]
will be whatever the caller puts into the argv0
spot in the exec*()
call.
The shell uses the convention that this is the program name, and most other programs follow the same convention, so argv[0]
usually the program name.
But a rogue Unix program can call exec()
and make argv[0]
anything it likes, so no matter what the C standard says, you can't count on this 100% of the time.
According to the C++ Standard, section 3.6.1:
argv[0] shall be the pointer to the initial character of a NTMBS that represents the name used to invoke the program or ""
So no, it is not guaranteed, at least by the Standard.
ISO-IEC 9899 states:
5.1.2.2.1 Program startup
If the value of
argc
is greater than zero, the string pointed to byargv[0]
represents the programname;argv[0][0]
shall be the null character if the program name is not available from the host environment. If the value ofargc
is greater than one, the strings pointed to byargv[1]
throughargv[argc-1]
represent the program parameters.
I've also used:
#if defined(_WIN32)
static size_t getExecutablePathName(char* pathName, size_t pathNameCapacity)
{
return GetModuleFileNameA(NULL, pathName, (DWORD)pathNameCapacity);
}
#elif defined(__linux__) /* elif of: #if defined(_WIN32) */
#include <unistd.h>
static size_t getExecutablePathName(char* pathName, size_t pathNameCapacity)
{
size_t pathNameSize = readlink("/proc/self/exe", pathName, pathNameCapacity - 1);
pathName[pathNameSize] = '\0';
return pathNameSize;
}
#elif defined(__APPLE__) /* elif of: #elif defined(__linux__) */
#include <mach-o/dyld.h>
static size_t getExecutablePathName(char* pathName, size_t pathNameCapacity)
{
uint32_t pathNameSize = 0;
_NSGetExecutablePath(NULL, &pathNameSize);
if (pathNameSize > pathNameCapacity)
pathNameSize = pathNameCapacity;
if (!_NSGetExecutablePath(pathName, &pathNameSize))
{
char real[PATH_MAX];
if (realpath(pathName, real) != NULL)
{
pathNameSize = strlen(real);
strncpy(pathName, real, pathNameSize);
}
return pathNameSize;
}
return 0;
}
#else /* else of: #elif defined(__APPLE__) */
#error provide your own implementation
#endif /* end of: #if defined(_WIN32) */
And then you just have to parse the string to extract the executable name from the path.
Applications of having argv[0] !=
executable name
-
many shells determine if they are a login shell by checking
argv[0][0] == '-'
. Login shells have different properties, notably that they source some default files such as/etc/profile
.It is typically the init itself or
getty
that adds the leading-
, see also: https://unix.stackexchange.com/questions/299408/how-to-login-automatically-without-typing-the-root-username-or-password-in-build/300152#300152 -
multi-call binaries, perhaps most notably Busybox. These symlink multiple names e.g.
/bin/sh
and/bin/ls
to a single exebutable/bin/busybox
, which recognizes which tool to use fromargv[0]
.This makes it possible to have a single small statically linked executable that represents multiple tools, and will work on basically on any Linux environment.
See also: https://unix.stackexchange.com/questions/315812/why-does-argv-include-the-program-name/315817
Runnable POSIX execve
example where argv[0] !=
executable name
Others mentioned exec
, but here is a runnable example.
a.c
#define _XOPEN_SOURCE 700
#include <unistd.h>
int main(void) {
char *argv[] = {"yada yada", NULL};
char *envp[] = {NULL};
execve("b.out", argv, envp);
}
b.c
#include <stdio.h>
int main(int argc, char **argv) {
puts(argv[0]);
}
Then:
gcc a.c -o a.out
gcc b.c -o b.out
./a.out
Gives:
yada yada
Yes, argv[0]
could also be:
- NULL: When can argv[0] have null?
- empty: Can argv[0] contain an empty string?
Tested on Ubuntu 16.10.