How to determine which disk failed in a FreeNAS / ZFS setup
I'm building out a FreeNAS based server in a Supermicro X6DHE-XB 3U enclosure with 4G of RAM, 16 SATA hot-swap bays. It comes with 2x8 port 3Ware RAID cards, but I'm planning on just using the ZFS capabilities instead of the hardware RAID. My initial drive set will be 8x2TB HITACHI Deskstar 7K3000 HDS723020BLA642 drives.
If I was using hardware based RAID, it would give me a red light on the drive bay where the drive failed. How does it work with ZFS when a drive fails? I don't think there is any guarantee that sda=bay1, sdb=bay2, etc, so how do you determine which drive needs to be replaced? Can ZFS report back to the SATA controller to turn on the "failed drive" light? Does it just report the drive serial number? What if the drive fails so hard it can't report it's serial number? I suppose it is a good idea to write down every drive's serial number and which bay it went into before you go live. Are there any other "pre-production" tasks to make replacing drives easier in the future?
zpool status -v
should tell you which disk is online or not.
The current version of FreeNAS (ver 9.3 at the moment) will create a gptid for each drive added to a zpool. Imediately after creation, the "zpool status" will look something like this (depending on your pool configuration)...
# zpool status
pool: myzfstest
state: ONLINE
scan: none
requested config:NAME STATE READ WRITE CKSUM myzfstest ONLINE 0 0 0 raidz-0 ONLINE 0 0 0 gptid/4fc2b789-7b7f-11e4-9585-de9b81338d40 ONLINE 0 0 0 gptid/51d38480-7b7f-11e4-9585-de9b81338d40 ONLINE 0 0 0 gptid/54c672cc-7b7f-11e4-9585-de9b81338d40 ONLINE 0 0 0 gptid/56a07638-7b7f-11e4-9585-de9b81338d40 ONLINE 0 0 0 raidz2-1 ONLINE 0 0 0 gptid/630e1317-7b7f-11e4-9585-de9b81338d40 ONLINE 0 0 0 gptid/6557b52d-7b7f-11e4-9585-de9b81338d40 ONLINE 0 0 0 gptid/667a1318-7b7f-11e4-9585-de9b81338d40 ONLINE 0 0 0 gptid/68cadf75-7b7f-11e4-9585-de9b81338d40 ONLINE 0 0 0 logs mirror-2 ONLINE 0 0 0 gptid/8839f22e-7b7f-11e4-9585-de9b81338d40 ONLINE 0 0 0 gptid/8a6d0b14-7b7f-11e4-9585-de9b81338d40 ONLINE 0 0 0 cache gptid/8c2f3824-7b7f-11e4-9585-de9b81338d40 ONLINE 0 0 0 gptid/8da9ba80-7b7f-11e4-9585-de9b81338d40 ONLINE 0 0 0 spares gptid/72f039f2-7b8a-11e4-9585-de9b81338d40 AVAIL gptid/750df91d-7b8a-11e4-9585-de9b81338d40 AVAIL
errors: No known data errors
Unfortunately, the web GUI doesn't show you these numbers. So, if you get an error saying that "gptid/6557b52d-7b7f-11e4-9585-de9b81338d40" is bad... how do you know which drive to pull? Figuring that part out requires some legwork at the time of install.
- When you build your system. Write down the serial number of every drive and also write down the location of where that drive was inserted. On a double sided JBOD case for instance, you may want to note front/back, row, & column.
- When you boot up FreeNAS, in the web GUI, go to "storage>volumes/view disks". On that tab you should have a list of all your drives and their serial numbers. Note the drive name given for each serial number you had in the previous list. If you don't see the serial numbers, you will have to drop to the shell and type
smartctl -a /dev/ada0 | grep ^Serial
(replacing "/dev/ada0" with each of the drive names from the list) -
Now, at the shell, we need to match up the drive names with all the gptid numbers. So, type
glabel status
and you should get something like this...# glabel status CORRECT>glabel status (y|n|e|a)? yes Name Status Components ufs/FreeNASs3 N/A ada0s3 ufs/FreeNASs4 N/A ada0s4 ufs/FreeNASs1a N/A ada0s1a gptid/616cddb6-7b7f-11e4-9585-de9b81338d40 N/A ada0p2 gptid/630e1317-7b7f-11e4-9585-de9b81338d40 N/A da1p1 gptid/6557b52d-7b7f-11e4-9585-de9b81338d40 N/A da2p1 gptid/667a1318-7b7f-11e4-9585-de9b81338d40 N/A da3p1 gptid/68cadf75-7b7f-11e4-9585-de9b81338d40 N/A da4p1
Now write in all the gptid numbers to associate them with the drive names and thus the serial numbers and their locations. Note: when you see something like "da3p1" that's partition one of the drive identified as da3. The list in the web GUI will only show the label "da3" for the disk.
Now, when an error comes up saying a disk with gptid number xyz has an error, you'll be able to reference your sheet and know which drive you need to pull/replace.
I know this is beyond late for the original poster; but, perhaps others will find this useful.
What you need is the sas2ircu utility from LSI (now Avago). LSI maintains versions for FreeBSD, Linux and Windwos. With FreeNAS you will need the FreeBSD version.
To try it you would put it in the /tmp directory and make it executable first.
Step one is discover the ID of your SAS HBA (example):
/tmp# ./sas2ircu list
LSI Corporation SAS2 IR Configuration Utility.
Version 19.00.00.00 (2014.03.17)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved.
Adapter Vendor Device SubSys SubSys
Index Type ID ID Pci Address Ven ID Dev ID
----- ------------ ------ ------ ----------------- ------ ------
0 SAS2008 1000h 72h 00h:04h:00h:00h 1000h 3020h
SAS2IRCU: Utility Completed Successfully.
Step two would be generate a list of all your devices you can examine later:
/tmp# ./sas2ircu 0 display > disklist.txt
Step 3 is examining your disk list. It will look similarly to:
/tmp# vi disklist.txt
LSI Corporation SAS2 IR Configuration Utility.
Version 19.00.00.00 (2014.03.17)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved.
Read configuration has been initiated for controller 0
------------------------------------------------------------------------
Controller information
------------------------------------------------------------------------
Controller type : SAS2008
BIOS version : 7.37.00.00
Firmware version : 19.00.00.00
Channel description : 1 Serial Attached SCSI
Initiator ID : 0
Maximum physical devices : 255
Concurrent commands supported : 3432
Slot : 4
Segment : 0
Bus : 4
Device : 0
Function : 0
RAID Support : No
------------------------------------------------------------------------
IR Volume information
------------------------------------------------------------------------
------------------------------------------------------------------------
Physical device information
------------------------------------------------------------------------
Initiator at ID #0
Device is a Enclosure services device
Enclosure # : 2
Slot # : 24
SAS Address : 5003048-0-00d3-a87d
State : Standby (SBY)
Manufacturer : LSI CORP
Model Number : SAS2X36
Firmware Revision : 0717
Serial No : x36557230
GUID : N/A
Drive Type : Undetermined
Device is a Enclosure services device
Enclosure # : 3
Slot # : 0
SAS Address : 5003048-0-00ca-7bfd
State : Standby (SBY)
Manufacturer : LSI CORP
Model Number : SAS2X28
Firmware Revision : 0717
Serial No : x36557230
GUID : N/A
Drive Type : Undetermined
Device is a Hard disk
Enclosure # : 4
Slot # : 0
SAS Address : 5003048-0-00d3-a8cc
State : Ready (RDY)
Size (in MB)/(in sectors) : 1907729/3907029167
Manufacturer : ATA
Model Number : WDC WD20EARS-00M
Firmware Revision : AB51
Serial No : WDWCAZA1037887
GUID : N/A
Drive Type : Undetermined
Device is a Hard disk
Enclosure # : 4
Slot # : 1
Step 4 is identifying your failed drive - you will know which by the missing or damaged information reported on the drive. Get the Enclosure # and The Slot # and use them to blink the tray LED in step 5 : To locate Enclosure # 4, Slot # 0
/tmp# ./sas2ircu 0 locate 4:1 ON
To turn the LED off after replacing:
/tmp# ./sas2ircu 0 locate 4:1 OFF
I hope this helps!
Look at the Volumes.
Select the Volume that is Degraded.
At the bottom of your screen there are three selections... click Volume Status
You will now see a closeup of the volume and its individual hard drives listed something like ada3p2, ada5p2, ada6p2, ada4p2 etc.
Select the Degraded Drive.
At the bottom of your screen you will see two options; Edit Disk and Replace
Select Edit Disk
You should now see the Serial number of the degraded disk.
Power down your FreeNAS server and look for that disk.