The best nagios plugin for SMART? [closed]

Solution 1:

The check_ide_smart plugin is part of the standard nagios plugins group. Despite the "ide" part of the name, it uses smartctl to check any drive that smartctl supports.

It can return nagios-suitable output, e.g:

$ ./check_ide_smart -n -d /dev/sda
OK - Operational (17/17 tests passed)

Or the full SMART status:

$ ./check_ide_smart -d /dev/sda
Id=  1, Status=11 {PreFailure , OnLine }, Value=100, Threshold= 16, Passed
Id=  2, Status= 5 {PreFailure , OffLine}, Value=100, Threshold= 50, Passed
Id=  3, Status= 7 {PreFailure , OnLine }, Value=120, Threshold= 24, Passed
Id=  4, Status=18 {Advisory    , OnLine }, Value=100, Threshold=  0, Passed
Id=  5, Status=51 {PreFailure , OnLine }, Value=100, Threshold=  5, Passed
Id=  7, Status=11 {PreFailure , OnLine }, Value=100, Threshold= 67, Passed
Id=  8, Status= 5 {PreFailure , OffLine}, Value=100, Threshold= 20, Passed
Id=  9, Status=18 {Advisory    , OnLine }, Value= 96, Threshold=  0, Passed
Id= 10, Status=19 {PreFailure , OnLine }, Value=100, Threshold= 60, Passed
Id= 12, Status=50 {Advisory    , OnLine }, Value=100, Threshold=  0, Passed
Id=192, Status=50 {Advisory    , OnLine }, Value= 99, Threshold= 50, Passed
Id=193, Status=18 {Advisory    , OnLine }, Value= 99, Threshold= 50, Passed
Id=194, Status= 2 {Advisory    , OnLine }, Value=144, Threshold=  0, Passed
Id=196, Status=50 {Advisory    , OnLine }, Value=100, Threshold=  0, Passed
Id=197, Status=34 {Advisory    , OnLine }, Value=100, Threshold=  0, Passed
Id=198, Status= 8 {Advisory    , OffLine}, Value=100, Threshold=  0, Passed
Id=199, Status=10 {Advisory    , OnLine }, Value=200, Threshold=  0, Passed
OffLineStatus=0 {NeverStarted}, AutoOffLine=No, OffLineTimeout=30 minutes
OffLineCapability=91 {Immediate Auto SuspendOnCmd}
SmartRevision=16, CheckSum=23, SmartCapability=3 {SaveOnStandBy AutoSave}

Solution 2:

I've used the plugin: check_ide_smart ; however, I eventually discovered that it did not notify me regarding errors in the smart log on the disk.

The problem bug is apparently still open after 5 years?

#473 check_ide_smart ignores SMART errors ! http://sourceforge.net/p/nagiosplug/bugs/473/

I am now enabling a more detailed smartd daemon on each system. I will then have nagios notify me if that process stops. I may have another check and restart if not running in cron.

From the smartd.conf:

First (primary) ATA/IDE hard disk. Monitor all attributes, enable automatic online data collection, automatic Attribute autosave, and start a short self-test every day between 2-3am, and a long self test Saturdays between 3-4am. report raw temperature changes >= 5 Celsius


smartd.conf

DEVICESCAN -H -m root -a -o on -S on -s (S/../.././02|L/../../6/03) -W 5