How to determine which disk failed in a FreeNAS / ZFS setup

I'm building out a FreeNAS based server in a Supermicro X6DHE-XB 3U enclosure with 4G of RAM, 16 SATA hot-swap bays. It comes with 2x8 port 3Ware RAID cards, but I'm planning on just using the ZFS capabilities instead of the hardware RAID. My initial drive set will be 8x2TB HITACHI Deskstar 7K3000 HDS723020BLA642 drives.

If I was using hardware based RAID, it would give me a red light on the drive bay where the drive failed. How does it work with ZFS when a drive fails? I don't think there is any guarantee that sda=bay1, sdb=bay2, etc, so how do you determine which drive needs to be replaced? Can ZFS report back to the SATA controller to turn on the "failed drive" light? Does it just report the drive serial number? What if the drive fails so hard it can't report it's serial number? I suppose it is a good idea to write down every drive's serial number and which bay it went into before you go live. Are there any other "pre-production" tasks to make replacing drives easier in the future?


zpool status -v should tell you which disk is online or not.


The current version of FreeNAS (ver 9.3 at the moment) will create a gptid for each drive added to a zpool. Imediately after creation, the "zpool status" will look something like this (depending on your pool configuration)...

# zpool status
pool: myzfstest
state: ONLINE
scan: none
requested config:

    NAME                                            STATE     READ WRITE CKSUM
    myzfstest                                       ONLINE       0     0     0
      raidz-0                                       ONLINE       0     0     0
        gptid/4fc2b789-7b7f-11e4-9585-de9b81338d40  ONLINE       0     0     0
        gptid/51d38480-7b7f-11e4-9585-de9b81338d40  ONLINE       0     0     0
        gptid/54c672cc-7b7f-11e4-9585-de9b81338d40  ONLINE       0     0     0
        gptid/56a07638-7b7f-11e4-9585-de9b81338d40  ONLINE       0     0     0
      raidz2-1                                      ONLINE       0     0     0
        gptid/630e1317-7b7f-11e4-9585-de9b81338d40  ONLINE       0     0     0
        gptid/6557b52d-7b7f-11e4-9585-de9b81338d40  ONLINE       0     0     0
        gptid/667a1318-7b7f-11e4-9585-de9b81338d40  ONLINE       0     0     0
        gptid/68cadf75-7b7f-11e4-9585-de9b81338d40  ONLINE       0     0     0
    logs
      mirror-2                                      ONLINE       0     0     0
        gptid/8839f22e-7b7f-11e4-9585-de9b81338d40  ONLINE       0     0     0
        gptid/8a6d0b14-7b7f-11e4-9585-de9b81338d40  ONLINE       0     0     0
    cache
      gptid/8c2f3824-7b7f-11e4-9585-de9b81338d40    ONLINE       0     0     0
      gptid/8da9ba80-7b7f-11e4-9585-de9b81338d40    ONLINE       0     0     0
    spares
      gptid/72f039f2-7b8a-11e4-9585-de9b81338d40    AVAIL
      gptid/750df91d-7b8a-11e4-9585-de9b81338d40    AVAIL

errors: No known data errors

Unfortunately, the web GUI doesn't show you these numbers. So, if you get an error saying that "gptid/6557b52d-7b7f-11e4-9585-de9b81338d40" is bad... how do you know which drive to pull? Figuring that part out requires some legwork at the time of install.

  1. When you build your system. Write down the serial number of every drive and also write down the location of where that drive was inserted. On a double sided JBOD case for instance, you may want to note front/back, row, & column.
  2. When you boot up FreeNAS, in the web GUI, go to "storage>volumes/view disks". On that tab you should have a list of all your drives and their serial numbers. Note the drive name given for each serial number you had in the previous list. If you don't see the serial numbers, you will have to drop to the shell and type smartctl -a /dev/ada0 | grep ^Serial (replacing "/dev/ada0" with each of the drive names from the list)
  3. Now, at the shell, we need to match up the drive names with all the gptid numbers. So, type glabel status and you should get something like this...

    # glabel status
    
    CORRECT>glabel status (y|n|e|a)? yes    
                                          Name  Status  Components  
                                 ufs/FreeNASs3     N/A  ada0s3  
                                 ufs/FreeNASs4     N/A  ada0s4  
                                ufs/FreeNASs1a     N/A  ada0s1a
    gptid/616cddb6-7b7f-11e4-9585-de9b81338d40     N/A  ada0p2  
    gptid/630e1317-7b7f-11e4-9585-de9b81338d40     N/A  da1p1   
    gptid/6557b52d-7b7f-11e4-9585-de9b81338d40     N/A  da2p1   
    gptid/667a1318-7b7f-11e4-9585-de9b81338d40     N/A  da3p1   
    gptid/68cadf75-7b7f-11e4-9585-de9b81338d40     N/A  da4p1   
    
  4. Now write in all the gptid numbers to associate them with the drive names and thus the serial numbers and their locations. Note: when you see something like "da3p1" that's partition one of the drive identified as da3. The list in the web GUI will only show the label "da3" for the disk.

Now, when an error comes up saying a disk with gptid number xyz has an error, you'll be able to reference your sheet and know which drive you need to pull/replace.

I know this is beyond late for the original poster; but, perhaps others will find this useful.


What you need is the sas2ircu utility from LSI (now Avago). LSI maintains versions for FreeBSD, Linux and Windwos. With FreeNAS you will need the FreeBSD version.

To try it you would put it in the /tmp directory and make it executable first.

Step one is discover the ID of your SAS HBA (example):

/tmp# ./sas2ircu list
LSI Corporation SAS2 IR Configuration Utility.
Version 19.00.00.00 (2014.03.17)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved.


         Adapter      Vendor  Device                       SubSys  SubSys
 Index    Type          ID      ID    Pci Address          Ven ID  Dev ID
 -----  ------------  ------  ------  -----------------    ------  ------
   0     SAS2008     1000h    72h   00h:04h:00h:00h      1000h   3020h
SAS2IRCU: Utility Completed Successfully.

Step two would be generate a list of all your devices you can examine later:

/tmp# ./sas2ircu 0 display > disklist.txt

Step 3 is examining your disk list. It will look similarly to:

/tmp# vi disklist.txt
LSI Corporation SAS2 IR Configuration Utility.
Version 19.00.00.00 (2014.03.17)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved.

Read configuration has been initiated for controller 0
------------------------------------------------------------------------
Controller information
------------------------------------------------------------------------
  Controller type                         : SAS2008
  BIOS version                            : 7.37.00.00
  Firmware version                        : 19.00.00.00
  Channel description                     : 1 Serial Attached SCSI
  Initiator ID                            : 0
  Maximum physical devices                : 255
  Concurrent commands supported           : 3432
  Slot                                    : 4
  Segment                                 : 0
  Bus                                     : 4
  Device                                  : 0
  Function                                : 0
  RAID Support                            : No
------------------------------------------------------------------------
IR Volume information
------------------------------------------------------------------------
------------------------------------------------------------------------
Physical device information
------------------------------------------------------------------------
Initiator at ID #0

Device is a Enclosure services device
  Enclosure #                             : 2
  Slot #                                  : 24
  SAS Address                             : 5003048-0-00d3-a87d
  State                                   : Standby (SBY)
  Manufacturer                            : LSI CORP
  Model Number                            : SAS2X36
  Firmware Revision                       : 0717
  Serial No                               : x36557230
  GUID                                    : N/A
  Drive Type                              : Undetermined

Device is a Enclosure services device
  Enclosure #                             : 3
  Slot #                                  : 0
  SAS Address                             : 5003048-0-00ca-7bfd
  State                                   : Standby (SBY)
  Manufacturer                            : LSI CORP
  Model Number                            : SAS2X28
  Firmware Revision                       : 0717
  Serial No                               : x36557230
  GUID                                    : N/A
  Drive Type                              : Undetermined

Device is a Hard disk
  Enclosure #                             : 4
  Slot #                                  : 0
  SAS Address                             : 5003048-0-00d3-a8cc
  State                                   : Ready (RDY)
  Size (in MB)/(in sectors)               : 1907729/3907029167
  Manufacturer                            : ATA
  Model Number                            : WDC WD20EARS-00M
  Firmware Revision                       : AB51
  Serial No                               : WDWCAZA1037887
  GUID                                    : N/A
  Drive Type                              : Undetermined

Device is a Hard disk
  Enclosure #                             : 4
  Slot #                                  : 1

Step 4 is identifying your failed drive - you will know which by the missing or damaged information reported on the drive. Get the Enclosure # and The Slot # and use them to blink the tray LED in step 5 : To locate Enclosure # 4, Slot # 0

 /tmp# ./sas2ircu 0 locate 4:1 ON

To turn the LED off after replacing:

/tmp# ./sas2ircu 0 locate 4:1 OFF

I hope this helps!


Look at the Volumes.

Select the Volume that is Degraded.

At the bottom of your screen there are three selections... click Volume Status

You will now see a closeup of the volume and its individual hard drives listed something like ada3p2, ada5p2, ada6p2, ada4p2 etc.

Select the Degraded Drive.

At the bottom of your screen you will see two options; Edit Disk and Replace

Select Edit Disk

You should now see the Serial number of the degraded disk.

Power down your FreeNAS server and look for that disk.