does PATH search include symlinks?

The permissions of the symlink itself are irrelevant. You couldn't even change them if you tried.

What matters are permissions of the underlying file.

It is fine to have directories in your PATH include symlinks to executables. In fact, it is likely that many executables in your PATH are symlinks. For example, on debian/ubuntu-like systems:

$ ls -l /bin/sh
lrwxrwxrwx 1 root root 4 Jan 23  2017 /bin/sh -> dash

Documentation

From man chmod:

chmod never changes the permissions of symbolic links; the chmod system call cannot change their permissions. This is not a problem since the permissions of symbolic links are never used. However, for each symbolic link listed on the command line, chmod changes the permissions of the pointed-to file. In contrast, chmod ignores symbolic links encountered during recursive directory traversals. [Emphasis added.]

Example

The shell has a test, -x, to determine if a file is executable. Let's try that:

$ ls -l
total 0
lrwxrwxrwx 1 john1024 john1024 7 Dec 12 23:36 foo -> foobar1
-rw-rw---- 1 john1024 john1024 0 Dec 12 23:36 foobar1
$ [ -x foo ] && echo foo is executable
$ chmod +x foobar1
$ [ -x foo ] && echo foo is executable
foo is executable

So, just like you found with which, the shell does not consider a softlink executable unless the underlying file is executable.

How which works

On a Debian system, which is a shell script. The relevant section of the code is:

 case $PROGRAM in
  */*)
   if [ -f "$PROGRAM" ] && [ -x "$PROGRAM" ]; then
    puts "$PROGRAM"
    RET=0
   fi
   ;;
  *)
   for ELEMENT in $PATH; do
    if [ -z "$ELEMENT" ]; then
     ELEMENT=.
    fi
    if [ -f "$ELEMENT/$PROGRAM" ] && [ -x "$ELEMENT/$PROGRAM" ]; then
     puts "$ELEMENT/$PROGRAM"
     RET=0
     [ "$ALLMATCHES" -eq 1 ] || break
    fi
   done
   ;;
 esac

As you can see, it uses the -x test to determine is a file is executable.

POSIX specifies the -x test as follows:

-x pathname
True if pathname resolves to an existing directory entry for a file for which permission to execute the file (or search it, if it is a directory) will be granted, as defined in File Read, Write, and Creation. False if pathname cannot be resolved, or if pathname resolves to an existing directory entry for a file for which permission to execute (or search) the file will not be granted. [Emphasis added.]

So, POSIX checks what the pathname resolves to. In other words, it accepts symlinks.

POSIX exec function

The POSIX exec function follows symlinks. The POSIX spec goes on at length to specify error conditions it may report if symlinks are circular or too deep, such as:

[ELOOP]
A loop exists in symbolic links encountered during resolution of the path or file argument.

[ELOOP]
More than {SYMLOOP_MAX} symbolic links were encountered during resolution of the path or file argument.
[ENAMETOOLONG]
As a result of encountering a symbolic link in resolution of the path argument, the length of the substituted pathname string exceeded {PATH_MAX}.


In this case symlinks are followed transparently, without canonicalizing the final path. In other words, which does not care about whether /home/mark/bin is a symlink or not. What it cares about is whether the file /home/mark/bin/foobar exists or not. It does not need to manually flatten symlinks along the path – the OS can do that just fine on its own.

And indeed, when which asks about file information of /home/mark/bin/foobar, the OS notices /home/mark/bin being a symlink, follows it, and successfully finds foobar in the target directory.

This is the default behavior unless the program uses open(…, O_NOFOLLOW) or fstatat(…, AT_SYMLINK_NOFOLLOW) to access the file.

[comments merged in]

While you say that shell utilities do it on a case-by-case basis, it is not the same with kernel syscalls: all file-related calls do follow symlinks by default, unless the "nofollow" flag is given. (Even lstat follows symlinks in all path components except the last one.)

When the specification does not explicitly mention what to do with symlinks, it implies the default behavior will be used. That is, a shell following the path algorithm neithers resolve symlinks manually nor does it explicitly opt out of the OS doing the same. (It just concatenates each $PATH component with the executable name.)

When the which(1) manual page says it does not follow symlinks, it can mean several things, but the GNU coreutils version states it this way:

Which will consider two equivalent directories to be different when one of them contains a path with a symbolic link.

That is much narrower in scope – it only means which will not try to manually canonicalize all paths to weed out duplicates, but it does not imply that the tool will opt out of symlink following by the OS in general. For example, if /bin is a symlink to /usr/bin, running which -a sh will return both /bin/sh and /usr/bin/sh.


The shell conforms to its documentation in that it follows the rules for pathname resolution. which conforms to its documentation. The two do slightly different things.

The output of which is the link's file name and path, not the path to what the symlink points to. This is spelled out in the man page.

When a command is executed, the link is "followed" as per Section 4.13 Pathname Resolution in the same. The relevant clause for executing a file is:

In all other cases, the system shall prefix the remaining pathname, if any, with the contents of the symbolic link, except that if the contents of the symbolic link is the empty string, then either pathname resolution shall fail with functions reporting an [ENOENT] error and utilities writing an equivalent diagnostic message, or the pathname of the directory containing the symbolic link shall be used in place of the contents of the symbolic link. If the contents of the symbolic link consist solely of characters, then all leading characters of the remaining pathname shall be omitted from the resulting combined pathname, leaving only the leading characters from the symbolic link contents. In the cases where prefixing occurs, if the combined length exceeds {PATH_MAX}, and the implementation considers this to be an error, pathname resolution shall fail with functions reporting an [ENAMETOOLONG] error and utilities writing an equivalent diagnostic message. Otherwise, the resolved pathname shall be the resolution of the pathname just created. If the resulting pathname does not begin with a , the predecessor of the first filename of the pathname is taken to be the directory containing the symbolic link.