Removing failing drive from LVM volume group ... and recovering partial data from an incomplete LV (with a missing PV)
In the end I solved this problem by manually editing the /etc/lvm/backup/lvm_group1
.
Here are the steps in case anyone else hits this problem:
- I physically removed the dead drive from the server
- I executed
vgreduce lvm_group1 --removemissing --force
- I removed from the config the dead drive
- I added another stripe on a "good" drive in place of the extents that were unreadable on the dead drive.
- I executed
vgcfgrestore -f edited_config_file.cfg lvm_group1
- Reboot
- Voila! Drive is visible and can be mounted.
It just took me 4 days of learning in-and-outs of LVM to solve this...
So far it looks good. No errors. Happy camping.
If you are ok to stop the LVM temporarily (and to close underlying LUKS containers if used) an alternative solution it to copy as much as possible of the PVs (or the underlying LUKS containers) to the good disk with GNU ddrescue
and to remove the old disk before restarting the LVM.
While I like Sniku's LVM solution, ddrescue
may be able to recover more data than pvmove
.
(The reason for stopping the LVM is that LVM has multipath support and would balance write operations between the pairs of PVs with identical UUIDs as soon as LVM discovers them. Furthermore, one should stop LVM and LUKS to ensure that all data that has recently been written is visible on the underlying devices. A restart of the system and not supplying the LUKS passwords is the easiest way to make sure of it.)