Troubleshooting linux server reboots?
I have a Linux server I've just set up, debian squeeze, 2.6.32-5-amd64, and over the past week it's rebooted three times, twice in one day. There was no power outage that I am aware of (and it's running on a UPS), and there are no errors in syslog, besides a few to-be-expected ones on bootup to do with clearing out entries in the ext4 journal due to the unclean shutdown.
What steps can I take to determine the cause of the reboots? Is there a way to get it to hang instead of rebooting, so I can copy stack traces or something off the screen? Any way to increase debug messages, or get it to dump things to disk, or something?
That may be some hardware problem; the most common are failed RAM and overheating. You could install mbmon
to monitor motherboard and CPU temperature; and runmemtest86+
to check your RAM and CPU cache.
There is a chance it is a 'kernel panic' and a kernel 'oops' message is sent to the console before the reboot. The kernel can be configured to reboot on 'panic' or to stay on. Check:
cat /proc/sys/kernel/panic
If it is non-zero try putting 0 there (you can do it directly writting to the file, via /etc/sysctl.conf which is usually parsed on boot, or using the sysctl
utility), this should stop rebooting. If it is already 0, then the reboots are not caused by kernel panics.