why ps aux is stuck?

I wonder on my staging server, whenever I run ps aux, it'll stuck. It outputs a list of process and stop responding. I checked there is enough RAM (1GB).

When I run the top command, it looks ok to me, but I wonder there is one zombie process. What's that? Anyone can explain?

top - 11:00:29 up  3:53,  2 users,  load average: 51.75, 50.52, 45.38
Tasks:  79 total,   1 running,  77 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1747660k total,   603572k used,  1144088k free,    12644k buffers
Swap:   917496k total,        0k used,   917496k free,    97732k cached

Thanks


Solution 1:

If on Linux, run strace on ps to see what system calls it calls, and you may potentially see on which one it's hanging on:

$ strace ps aux

If you're on a different Unix-y system you'd use truss or dtruss.

Solution 2:

Considering that you have a load of over 50, but the CPU is 100% idle, I would look at i/o. This looks like the computer is waiting for either the disks or the network to return data before it can proceed.

Try using iotop to see what's blocking this. It could be a drive on its way out.

Similar behaviour is also possible of your machine is configured to do name/group resolution and authentication against an external resource and that resource is not available. In that case it would help to check you PAM configuration and any relevant services, such as DNS, LDAP, NIS.

Solution 3:

Wikipedia explains it in great detail.

Solution 4:

Um...your load average is showing something is REALLY eating i/o or CPU. Is your hard disk thrashing? Is it sluggish in response? A load average of 51 is really not all that normal.

Zombie - if a process is spawned from another process and ends, it's supposed to be reaped by the parent process again. If the parent crashed or disappeared, the orphaned process turns into a zombie. It is essentially a process in the task table that isn't taking resources or doing anything, but now the init process has to take care of it. There's really no way to clear them without a reboot, but they usually don't hurt anything. Now, if you have a LOT of zombies, you have a problem; a process with a bug, a resource issue, something isn't running properly. One or two zombies do not a Romero movie make, though.

Clarification-I used the term orphan, but the pedantic side would point out there is a difference between an orphaned process and a zombie. Orphans are still running, while the zombie is doing nothing, it's not taking resources other than a process table entry. And I'm not sure about init taking the zombie process. I may be wrong on that (Init is supposed to take care of orphan processes). Either way, unless you have many zombies appearing, one or two on a system isn't a problem normally.