Proper way to deal with corrupt XFS filesystems
I recently had an XFS filesystem become corrupt due to a powerfail. (CentOS 7 system). The system wouldn't boot properly.
I booted from a rescue cd and tried xfs_repair
, it told me to mount the partition to deal with the log.
I mounted the partition, and did an ls
to verify that yes, it appears to be there. I unmounted the partition and tried xfs_repair
again and got the same message.
What am I supposed to do in this situation? Is there something wrong with my rescue cd (System Rescue CD, version 4.7.1)? Is there some other procedure I should have used?
I ended up simply restoring the system from backups (it was quick and easy in this case), but I'd like to know what to do in the future.
Solution 1:
If you're attempting to run xfs_repair, getting the error message that suggests mounting the filesystem to replay the log, and after mounting still receiving the same error message, you may need to perform a forced repair (using the -L
flag with xfs_repair
). This option should be a last resort.
For example, I'll use a case where I had a corrupt root partition on my CentOS 7 install. When attempting to mount the partition, I continually received the below error message:
mount: mount /dev/mapper/centos-root on /mnt/centos-root failed: Structure needs cleaning
Unfortunately, forcing a repair would involve zeroing out (destroying) the log before attempting a repair. When using this method, there is a potential of ending up with more corrupt data than initially anticipated; however, we can use the appropriate xfs tools to see what kind of damage may be caused before making any permanent changes.
Using xfs_metadump and xfs_mdrestore, you can create a metadata image of the affected partition and perform the forced repair on the image rather than the partition itself. The benefits of this is the ability to see the damage that comes with a forced repair before performing it on the partition.
To do this, you'll need a decent sized USB or external hard drive. Start by mounting the USB drive - my USB was located at /dev/sdb1
, yours may be named differently.
mkdir -p /mnt/usb
mount /dev/sdb1 /mnt/usb
Once mounted, run xfs_metadump
to create a copy of the partition metadata to the USB - again, your affected partition may be different. In this case, I had a corrupt root partition located at /dev/mapper/centos-root
:
xfs_metadump /dev/mapper/centos-root /mnt/usb/centos-root.metadump
Next, you'll want to restore the metadata in to an image so that we can perform a repair and measure the damage.
xfs_mdrestore /mnt/usb/centos-root.metadump /mnt/usb/centos-root.img
I found that in rescue mode xfs_mdrestore
is not available, and instead you'll need to be in rescue mode of a live CentOS CD.
Finally, we can perform the repair on the image:
xfs_repair -L /mnt/usb/centos-root.img
After the repair has completed and you've assessed the output and potential damage, you can determine as to whether you'd like to perform the repair against the partition.
To run the repair against the partition, simply run:
xfs_repair -L /dev/mapper/centos-root
Don't forget to check the other partitions for corruption as well. After the repairs, reboot the system and you should be able to successfully boot.
Remember that the -L
flag should be used as a last resort where there are no other possible options to repair.
I found that these online articles helped:
- https://web.archive.org/web/20140920034637/http://geekblood.com/2014/08/13/filesystem-corruption-xfs-and-rhelv7/
- https://web.archive.org/web/20160319163101/http://oss.sgi.com/archives/xfs/2015-01/msg00503.html
- http://dhoytt.com/blog/2015/07/26/xfs-filesystem-repair-gets-web-server-back/
Solution 2:
I had this error whe centos 7 bad stop inside a kvm virtual-machine:
# metadata corruption detected at xfs...
when I use the log wiht journalctl -xe
, I found an error mounting:
# /dev/mapper/root /sysroot
I solve it using:
# xfs_repair /dev/mapper/root
Then the system complete the seven phases and then y reboot using
# ./shutdown
And then the virtual machine centos 7 work well…
Regards
Note: maybe your /dev/mapper/root
has an other name, please watch your error log with journalctl -xe
to find the name of your unit bad mounted