Machine check events logged
In /var/log/messages, this error occurred:
Sep 19 13:18:15 wdc kernel: [2772302.630416] Machine check events logged
Shortly there after, the entire server became unresponsive. This is in the log of the Dom0 for a Xen Server (running the latest version on Debian Squeeze).
Can anyone shed some light on what this error means? Should I be ordering new hardware?
Edit: Also, it seems to imply it logged something, where can I find that?
For more information check logfile (this log file might be there or might not be, it depends how it is configured in /etc/mcelog/mcelog.conf) where should be detail description of the problem found.
/var/log/mcelog
or just run command
mcelog
Mcelog is decoding kernel machine check log on x86 machines. From man mcelog
:
X86 CPUs report errors detected by the CPU as machine check events (MCEs). These
can be data corruption detected in the CPU caches, in main memory by an integrated
memory controller, data transfer errors on the front side bus or CPU interconnect or
other internal errors. Possible causes can be cosmic radiation, instable power
supplies, cooling problems, broken hardware, or bad luck.
Most errors can be corrected by the CPU by internal error correction mechanisms.
Uncorrected errors cause machine check exceptions which may panic the machine.
When a corrected error happens the x86 kernel writes a record describing the MCE into
a internal ring buffer available through the /dev/mcelog device mcelog retrieves
errors from /dev/mcelog, decodes them into a human readable format and prints them on
the standard output or optionally into the system log.
You can find more information about mcelog and its configuration/errors/triggers on the project webpage Mcelog project webpage
mcelog
was removed in Debian 10+ (Buster) and Ubuntu 18.04+
The functionality has been replaced by rasdaemon
.
The log entries were written by mcelog. Its logfile can be found in /var/log/mcelog
, or depending on the system, additionally in syslog or the systemd journal.
X86 CPUs have the ability to detect and sometimes correct hardware errors (memory, IO, and CPU hardware errors). mcelog retrieves these errors from /dev/mcelog
, where the Linux kernel writes then.
As your system crashed, correction of the hardware likely failed. If the system keeps running, auto-correction seems to be working.
For more background about the implications of seeing such messages, refer to “mce: [Hardware Error]: Machine check events logged” appears in syslog. What should I do?