Are .pid files reliable for determining whether a process is running?

Many programs such as sshd create .pid files in /var/run/ that contain their process ID. Are these files reliable for determining whether a process is running? My guess is that these files are created manually by a process, and therefore will still remain in the file system if the program crashes.


Solution 1:

in simple terms, no: a process (e.g. a daemon) can crash and not have the time to clear its .pid file.

A technique to be more certain of the state of a program: use an explicit communication channel such as a socket. Write the socket port in a file and have the supervisor process look it up.

You can also use the services of DBus on Linux: register a specific name and have your supervisor process (whatever you call it) check for that name.

There are numerous techniques.

One thing to remember: it is not the OS' responsibility to manage the PID files.

Solution 2:

Jldupont is correct in stating that .pid files are not reliable for determining whether a process is running as the file may not be removed in the event of a crash.

Race conditions aside, I often use pgrep when I need to know if a process is running. I could then cross-reference the output against the .pid file(s) if I felt it necessary.

Solution 3:

A file containing a process id is not reliable do determine if a process is running or not. It is just a reliable source, to figure out the last given process id for the process.

When you have the process id, you have to do futher checking, if the process is realy running.

Here is an example:

#!/usr/bin/env sh

file="/var/run/sshd.pid"
processid=$(cat /var/run/sshd.pid)

if [ ! -f ${file} ]; then
    echo "File does not exists: ${file}"
    exit 1
fi

if [ ! -r ${file} ]; then
    echo "Insufficient file persmissons: ${file}"
    exit 1
fi

psoutput=$(ps -p ${processid} -o comm=)

if [ $? == 0 ];then
    if [ ${psoutput} == "sshd" ]; then
        echo "sshd process is realy running with process id ${processid}"
        exit 0
    else
        echo "given process id ${processid} is not sshd: ${psoutput}"
        exit 1
    fi
else
    echo "there is no process runing with process id ${processid}"
    exit 0
fi

pgrep is a nice command, but you'll get in trouble, when you have multiple instances running. For example when you have a regular sshd running on port TCP/22 and you have another sshd running on port TCP/2222, then pgrep will deliver two process ids when searching for sshd... when the normal sshd have its pid in /var/run/sshd.pid and the other could have its pid in /var/run/sshd-other.pid you can clearly differentiate the processes.

I do not recommend using just ps, piping through one or multiple pipes with grep and grep -v trying to filter out all other stuff which does not interest you... it a bit like using

find . | grep myfile

to figure out, if a file exits.