How do you check the health of individual hard drives in a RAID array?
Solution 1:
Typically, what you wan is a package called smartmontools. It can query the SMART interface on your disks, which is in most modern disks.
There is a daemon called smartd which can help you with continuous monitoring.
However, if your system is a home server, just checking manually is often better. Like so:
smartctl -a /dev/sda
A lot of data spews forth. The stuff that most interest me are the following:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0
9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 13946
13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 075 066 000 Old_age Always - 25
194 Temperature_Celsius 0x0022 075 064 000 Old_age Always - 25
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0
201 Soft_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0
This gives you a way to measure the drive health subjectively. When the error rate starts going up, its time to look for a replacement. Also, you can check that they are not running hot.
Solution 2:
Something like "mdadm --query --detail /dev/md0" should work, but when the drive actually fail, the root will receive an e-mail (it's the standard config on Centos and i believe on other distros as well). Just check that notification by failing (like: mdadm --manage /dev/md0 --fail /dev/sda1), and You will be 100% sure.