Munin's smart plugin keeps reporting an error in the past because of the exit code
Eventually I have resorted to patching the smart plugin. Depending on your version there is some code like this:
if exit_status!=None :
# smartctl exit code is a bitmask, check man page.
num_exit_status=int(exit_status/256)
replace it with this
if exit_status!=None :
# smartctl exit code is a bitmask, check man page.
num_exit_status=int(exit_status/256)
# filter out bit 6
num_exit_status &= 191
if num_exit_status<=2 :
exit_status=None
if exit_status!=None :
The most interesting part is the line where there is a bitwise operation with 191: this is 0x11011111 in binary, so doing an AND operation with the current value it will just set bit no 6 to 0 while letting the other values untouched.
Therefore a value of 64 (as mine does) will be reported as 0 while a value of 8 would remain at 8. But also, very importantly, a value of 72 (bit 6 set as always and bit 3 set because the disk is failing) it would also report 8.
The only way I found to avoid this problem without modifying munin sources was to avoid using -a
option with smartctl, e.g. use something like this in /etc/munin/plugin-conf.d/munin-node
:
[smart_sda]
env.smartargs -H -i -c -A -l selftest -l selective
(i.e. all options normally included in -a
except for -l error
).