debugging stuck apache/php thread on production server
I have a linux system with apache httpd and PHP which is loaded using LoadModule php5_module /usr/lib/apache2/modules/libphp5.so
.
I've enabled the mod_status module of apache and I see a particular thread which is stuck doing something since yesterday. I also confirm this by doing ps -axu | grep apache
which among the many threads it gives me that particular stuck thread:
www-data 5636 0.0 0.1 423556 23560 ? S XXXXX 0:04 /usr/sbin/apache2 -k start
Note that XXXXX is something like Jan02 which is yesterday. Also, the pid (5636) matches the pid of the stuck thread I see in the mod_status page of apache.
My question is: how can I do a thread dump or something similar in order to see where exactly in the PHP code this thing is stuck? Maybe it's waiting for something (i/o, network, db) but I don't know what.
In the java world I'd do a kill -3 pid
and get a nice readable thread dump which would clearlly show me where exactly that particular thread is stuck at. Is there a similar technique for the php land?
The following instructions are Linux-centric:
- Identify the faulty / stuck process
In your case, the process is in state S
, meaning from man ps
:
S interruptible sleep (waiting for an event to complete)
So yes, it is probably waiting for some network or filesystem operation to complete.
- Trace system calls and signals with
strace
Attach the strace
program to the hanging thread by running:
# strace -p
This will show you, in real time, the actions or more precisely the syscalls ran by the program, for instance, you might see a loop with open()
returning an error such as ENOENT
meaning that a particular file is not there.
Your ps
output indicates that the process is not consuming CPU (3rd column), so the problem here is probably not related to a loop but just a waiting operation such as a locked file, waiting for a socket or an external action.
-
kill
and coredumps
The kill
program, which is used to send a particular signal to a running program is far from being java-related, it very well can be used to send the signal 3 (SIGQUIT
) which will close the program and generate a core
file.
The generation of a core
file is permitted only if the correct ulimit
permissions are in place, check it with the ulimit -c
command. If it says 0
, then you should modify it, for instance, to unlimited
:
ulimit -c unlimited
Only then should you restart the application and provoke a coredump by sending a kill -3
.
You want to install PHP xdebug extension and enable tracing log. It will then create a file, which will have stats for every executed function - time it took to complete, amount of consumed memory, path to file that contains that function, etc.
This data will help you to identify which function needs fixing, but beware that full tracing log grows in size very quick and you might want to trace only part of your application (which also described in the guide above)...