How to get email alert if one of raid 1 disks fails?

I need to know how can I get email alert if one of raid 1 disks fail to work/crashes. I have CentOS 6.4 64bits, software raid.

I did some mistake folowing this tutorial, because it was a bottom note

NOTE: It has been found that mdadm will not send an e-mail if the DEVICE partitions section does not exist in the /etc/mdadm.conf file. If those sections do not exist a new /etc/mdadm.conf file can be created by using the following command: mdadm –detail –scan > /etc/mdadm.conf"

I executed that line and my mdadm.conf file was empty whith that response from ssh : "mdadm: An option must be given to set the mode before a second device (–scan) is listed"

I also undestand I have to start it using this ssh line: mdadm –monitor –scan –daemonize but I get this response " mdadm: An option must be given to set the mode before a second device (–scan) is listed "

this is 'cat /proc/mdstat' :

  Personalities : [raid1]
  md0 : active raid1 sdb1[1] sda2[0]
  117153664 blocks super 1.1 [2/2] [UU]
  bitmap: 1/1 pages [4KB], 65536KB chunk

   unused devices: <none>

and

 mdadm -D /dev/md0
/dev/md0:
    Version : 1.1
  Creation Time : Sat Aug 17 09:19:15 2013
 Raid Level : raid1
  Array Size : 117153664 (111.73 GiB 119.97 GB)
 Used Dev Size : 117153664 (111.73 GiB 119.97 GB)
  Raid Devices : 2
 Total Devices : 2
 Persistence : Superblock is persistent

 Intent Bitmap : Internal

  Update Time : Mon Sep 16 18:55:19 2013
      State : active
  Active Devices : 2
 Working Devices : 2
 Failed Devices : 0
 Spare Devices : 0

       Name : trader:0
       UUID : 0944131a:0513ca86:cb8ad6c5:3baca49f
     Events : 1751

Number   Major   Minor   RaidDevice State
   0       8        2        0      active sync   /dev/sda2
   1       8       17        1      active sync   /dev/sdb1

mdadm.conf file generated 1 minute ago with mdadm --examine --scan > /etc/mdadm.conf :

  MAILADDR [email protected]
  ARRAY /dev/md/0 metadata=1.1 UUID=0944131a:0513ca86:cb8ad6c5:3baca49f name=trader:0

Is this enough to get email notifications if one hdd fails for my case ?


Solution 1:

Blazer, it looks like in the process of improving your question (which is now a good one, by the way), you've found your own answer. Well done, you! But there is a little more that could usefully be said.

As far as I know, that mdadm.conf will suffice for you to get automated notifications. Certainly, mine looks very little different to that, and I know from a recent failout test that I get notifications. (I'm a little curious about the second slash in /dev/md/0, but if that's what your system wrote, it's very likely right.)

But it's axiomatic in professional sysadmin that, unless you've tested something, you can't really know that it works.

At the very least, you will want to check that you can send mail from that system, as root, to the specified gmail.com address, and receive it.

If I were you, I'd at least perform a soft failure test. You can do that with mdadm /dev/md0 -f /dev/sdb1. That will fail the second partition out of the array, and should generate a formal notification to you (check your system's mail logs to see if it's gone). Check the output of cat /proc/mdstat so you know what a half-bad array looks like.

You can resync the array later with mdadm /dev/md0 -a /dev/sdb1, and check that it's sync'ed back with another cat /proc/mdstat.

If you want to go the whole hog, schedule some downtime, try pulling one of the drives, and check that the system can still boot. Where the metadevice in question is the boot partition, people sometimes forget to have a GRUB boot block on both drives, so when the second one fails, their system becomes unbootable. Replace and resync the drive later.

Whatever tests you decide to do, document them, so that when there's a real failure, you know what to expect, and you can minimise the chance of pilot error trashing the second drive.