How do I diagnose a hard Linux crash?

Solution 1:

I am a bit surprised no one has suggested the use of the SysRq magic key.

First of all, it should be used instead of the power switch to force a reboot, because this gives programs a chance to save unsaved data to the disk; failure to do so might cause considerable problems upon reboot (not to mention the crashing bore of having to wait for the usual fsck check). This is done as follows: keeping Alt and SysRq simultaneously pressed, enter, each spaced by a few seconds, r e i s u b (the famous mnemonics in English is Raising Elephants Is So Utterly Boring, I prefer Running Errands Is So Utterly Boring, try to come up with a better one if you can).

Even apart from this, when the system freezes the use of Alt + SysRq + X (where X is a letter) allows you to run some diagnostics: for instance, X=d displays all current locks, which may help diagnose a software problem; X=j thaws frozen filesystems; X=l (l is an ell) shows a stack backtrace; X=t outputs to the console a list of current tasks; X=w displays a list of blocked tasks.

You can find more codes on Wikipedia.

While I cannot say this will be a decisive step (there are situations where even this fails), yet it is the next step in the investigation, which will help point to a software or hardware problem, and to restrict the range of possible culprits.

Solution 2:

1: Is your Ubuntu Stable?? Did you download a stable version of ubuntu? if not try downgrading to the latest stable build.

2: Have you tried it on another Virtual/Physical Machine? It could very well be a script error testing it in a VM like Virtual Box that will more then likely prevent any hard-crashing if you haven't tried these steps already also it would give you an environment where you could debug and monitor the OS

3: Ram failure? Okay so its very unlikely to be the local SSD/HDD/SSHD because the linux os is loaded into the RAM and it would post a warning if there was an inability to contact the kernel then it would crash. however if the ram where to lock-up because its faulty/Defective the operating system would freeze completely being unable to post (or even be aware of) any errors which might explain there being no logs However it is VERY possible that it could be something else

4: Have a look at the forums Okay i'm not the most-effective Linux user out there and there is a lot that i don't really know i have had similar hardware and software issues, however i don't really know what it is your home-brew server does so its hard to pinpoint the flaw out there id browse the Forum