Replace zfs pool member after reboot reordered disk paths
I created a raidz1-0 pool with three devices. Two where added by their /dev/disk/by-id
ID and somehow I decided to use /dev/sdg1
for the third one.
After a reboot years later, I can't get all three devices online again. Here's the current status:
# zpool status safe00
pool: safe00
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub repaired 0 in 2h54m with 0 errors on Sun Jan 12 03:18:13 2020
config:
NAME STATE READ WRITE CKSUM
safe00 DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
ata-ST3500418AS_9VM89VGD ONLINE 0 0 0
13759036004139463181 OFFLINE 0 0 0 was /dev/sdg1
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E1NYTHJF ONLINE 0 0 0
errors: No known data errors
The drives in this machine are:
# lsblk -f
NAME FSTYPE LABEL UUID MOUNTPOINT
sda
├─sda1 ext4 Ubuntu LTS 8a2a3c19-580a-474d-b248-bf0822cacab6 /
├─sda2 vfat B55A-693E /boot/efi
└─sda3 swap swap 7d1cf001-07a6-4534-9624-054d70a562d5 [SWAP]
sdb zfs_member dump 11482263899067190471
├─sdb1 zfs_member dump 866164895581740988
└─sdb9 zfs_member dump 11482263899067190471
sdc
sdd
├─sdd1 zfs_member dump 866164895581740988
└─sdd9
sde zfs_member dump 866164895581740988
├─sde1 zfs_member safe00 6143939454380723991
└─sde2 zfs_member dump 866164895581740988
sdf
├─sdf1 zfs_member dump 866164895581740988
└─sdf9
sdg
├─sdg1 zfs_member safe00 6143939454380723991
└─sdg9
sdh
├─sdh1 zfs_member safe00 6143939454380723991
└─sdh9
which is to say that the safe00
should contain the three devices: sde1
, sdg
& sdh
.
And just to get mapping to the by-id
and path
:
# cd /dev/disk/by-id
# ls -la ata* | cut -b 40- | awk '{split($0, a, " "); print a[3],a[2],a[1]}' | sort -h
../../sda1 -> ata-INTEL_SSDSC2KW120H6_BTLT712507HK120GGN-part1
../../sda2 -> ata-INTEL_SSDSC2KW120H6_BTLT712507HK120GGN-part2
../../sda3 -> ata-INTEL_SSDSC2KW120H6_BTLT712507HK120GGN-part3
../../sda -> ata-INTEL_SSDSC2KW120H6_BTLT712507HK120GGN
../../sdb1 -> ata-WDC_WD20EARX-00PASB0_WD-WCAZAE573068-part1
../../sdb9 -> ata-WDC_WD20EARX-00PASB0_WD-WCAZAE573068-part9
../../sdb -> ata-WDC_WD20EARX-00PASB0_WD-WCAZAE573068
../../sdc -> ata-SAMSUNG_HD204UI_S2H7JD1ZA21911
../../sdd1 -> ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0416553-part1
../../sdd9 -> ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0416553-part9
../../sdd -> ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0416553
../../sde1 -> ata-ST6000VN0033-2EE110_ZAD5S9M9-part1
../../sde2 -> ata-ST6000VN0033-2EE110_ZAD5S9M9-part2
../../sde -> ata-ST6000VN0033-2EE110_ZAD5S9M9
../../sdf1 -> ata-WDC_WD10EADS-00L5B1_WD-WCAU4C151323-part1
../../sdf9 -> ata-WDC_WD10EADS-00L5B1_WD-WCAU4C151323-part9
../../sdf -> ata-WDC_WD10EADS-00L5B1_WD-WCAU4C151323
../../sdg1 -> ata-ST3500418AS_9VM89VGD-part1
../../sdg9 -> ata-ST3500418AS_9VM89VGD-part9
../../sdg -> ata-ST3500418AS_9VM89VGD
../../sdh1 -> ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E1NYTHJF-part1
../../sdh9 -> ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E1NYTHJF-part9
../../sdh -> ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E1NYTHJF
And zdb (with minor ANNOTATION by me)
# zdb -C safe00
MOS Configuration:
version: 5000
name: 'safe00'
state: 0
txg: 22826770
pool_guid: 6143939454380723991
errata: 0
hostname: 'filserver'
vdev_children: 1
vdev_tree:
type: 'root'
id: 0
guid: 6143939454380723991
children[0]:
type: 'raidz'
id: 0
guid: 9801294574244764778
nparity: 1
metaslab_array: 33
metaslab_shift: 33
ashift: 12
asize: 1500281044992
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 135921832921042063
path: '/dev/disk/by-id/ata-ST3500418AS_9VM89VGD-part1'
whole_disk: 1
DTL: 58
create_txg: 4
children[1]: ### THIS CHILD USED TO BE sdg1
type: 'disk'
id: 1
guid: 13759036004139463181
path: '/dev/sdg1'
whole_disk: 0
not_present: 1 ### THIS IS sde1 NOW
DTL: 52
create_txg: 4
offline: 1
children[2]: ### THIS CHILD IS NOW sdg1
type: 'disk'
id: 2
guid: 2522190573401341943
path: '/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E1NYTHJF-part1'
whole_disk: 1
DTL: 57
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
space map refcount mismatch: expected 178 != actual 177
Summary for the pool safe00
:
offline: sde1 --> ata-ST6000VN0033-2EE110_ZAD5S9M9-part1 <-- this likely was sdg1 before reboot
online: sdg1 --> ata-ST3500418AS_9VM89VGD
online: sdh1 --> ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E1NYTHJF
Trying to online the device that's offline:
# zpool online safe00 ata-ST6000VN0033-2EE110_ZAD5S9M9-part1
cannot online ata-ST6000VN0033-2EE110_ZAD5S9M9-part1: no such device in pool
# zpool online safe00 /dev/sde1
cannot online /dev/sde1: no such device in pool
I also tried to replace the offline device with the real one:
# zpool replace safe00 13759036004139463181 ata-ST6000VN0033-2EE110_ZAD5S9M9-part1
invalid vdev specification
use '-f' to override the following errors:
/dev/disk/by-id/ata-ST6000VN0033-2EE110_ZAD5S9M9-part1 is part of active pool 'safe00'
# zpool replace safe00 /dev/sdg1 ata-ST6000VN0033-2EE110_ZAD5S9M9-part1
invalid vdev specification
use '-f' to override the following errors:
/dev/disk/by-id/ata-ST6000VN0033-2EE110_ZAD5S9M9-part1 is part of active pool 'safe00'
So, finally I tried to online the missing device using it's ID:
# zpool online safe00 13759036004139463181
warning: device '13759036004139463181' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present
This happily put the disk in FAULTED and a repair was started.
# zpool status safe00
pool: safe00
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: scrub in progress since Sun Feb 23 11:19:00 2020
14.3G scanned out of 1.09T at 104M/s, 3h0m to go
0 repaired, 1.29% done
config:
NAME STATE READ WRITE CKSUM
safe00 DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
ata-ST3500418AS_9VM89VGD ONLINE 0 0 0
13759036004139463181 FAULTED 0 0 0 was /dev/sdg1
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E1NYTHJF ONLINE 0 0 0
errors: No known data errors
What should I do to avoid this from happening again - how do I change the device's "path" property in zdb so it doesn't rely on Linux' enumeration of disks at bootup?
The most reliable method might be to create pools using GUID, or GPT labels, and personally I think GPT label is a better solution as mentioned in one of the posts in Best practice for specifying disks (vdevs) for ZFS pools in 2021
data-1-sces3-3tb-Z1Y0P0DK
<pool>-<pool-id>-<disk-vendor-and-model-name>-<size-of-disk>-<disk-serial-number>
Naming in this way will help you with these:
- Easily understand topology of defined pools.
- Easily find vendor name and model name of drives used.
- Easily find disk capacities.
- Easily identify and find a bad disk(s) in the drive cage(s) while you include a serial number printed on drive inside a GPT label.
There exists other persistent methods to identify disks, such as using some kind of IDs, but it's not intuitive enough on its own, you can't find your disk easily just based on its electric ID, you have to link the ID to its physical location by yourself.
And I also found this might help if you want to remap the disks in the poolMixed gptid and dev names in zpool status:
# zpool import -d /dev/gptid tank