Proper way to deal with corrupt XFS filesystems

I recently had an XFS filesystem become corrupt due to a powerfail. (CentOS 7 system). The system wouldn't boot properly.

I booted from a rescue cd and tried xfs_repair, it told me to mount the partition to deal with the log.

I mounted the partition, and did an ls to verify that yes, it appears to be there. I unmounted the partition and tried xfs_repair again and got the same message.

What am I supposed to do in this situation? Is there something wrong with my rescue cd (System Rescue CD, version 4.7.1)? Is there some other procedure I should have used?

I ended up simply restoring the system from backups (it was quick and easy in this case), but I'd like to know what to do in the future.


Solution 1:

If you're attempting to run xfs_repair, getting the error message that suggests mounting the filesystem to replay the log, and after mounting still receiving the same error message, you may need to perform a forced repair (using the -L flag with xfs_repair). This option should be a last resort.

For example, I'll use a case where I had a corrupt root partition on my CentOS 7 install. When attempting to mount the partition, I continually received the below error message:

mount: mount /dev/mapper/centos-root on /mnt/centos-root failed: Structure needs cleaning

Unfortunately, forcing a repair would involve zeroing out (destroying) the log before attempting a repair. When using this method, there is a potential of ending up with more corrupt data than initially anticipated; however, we can use the appropriate xfs tools to see what kind of damage may be caused before making any permanent changes.

Using xfs_metadump and xfs_mdrestore, you can create a metadata image of the affected partition and perform the forced repair on the image rather than the partition itself. The benefits of this is the ability to see the damage that comes with a forced repair before performing it on the partition.

To do this, you'll need a decent sized USB or external hard drive. Start by mounting the USB drive - my USB was located at /dev/sdb1, yours may be named differently.

mkdir -p /mnt/usb
mount /dev/sdb1 /mnt/usb

Once mounted, run xfs_metadump to create a copy of the partition metadata to the USB - again, your affected partition may be different. In this case, I had a corrupt root partition located at /dev/mapper/centos-root:

xfs_metadump /dev/mapper/centos-root /mnt/usb/centos-root.metadump

Next, you'll want to restore the metadata in to an image so that we can perform a repair and measure the damage.

xfs_mdrestore /mnt/usb/centos-root.metadump /mnt/usb/centos-root.img

I found that in rescue mode xfs_mdrestore is not available, and instead you'll need to be in rescue mode of a live CentOS CD.

Finally, we can perform the repair on the image:

xfs_repair -L /mnt/usb/centos-root.img

After the repair has completed and you've assessed the output and potential damage, you can determine as to whether you'd like to perform the repair against the partition.

To run the repair against the partition, simply run:

xfs_repair -L /dev/mapper/centos-root

Don't forget to check the other partitions for corruption as well. After the repairs, reboot the system and you should be able to successfully boot.

Remember that the -L flag should be used as a last resort where there are no other possible options to repair.

I found that these online articles helped:

  • https://web.archive.org/web/20140920034637/http://geekblood.com/2014/08/13/filesystem-corruption-xfs-and-rhelv7/
  • https://web.archive.org/web/20160319163101/http://oss.sgi.com/archives/xfs/2015-01/msg00503.html
  • http://dhoytt.com/blog/2015/07/26/xfs-filesystem-repair-gets-web-server-back/

Solution 2:

I had this error whe centos 7 bad stop inside a kvm virtual-machine:

# metadata corruption detected at xfs...

when I use the log wiht journalctl -xe, I found an error mounting:

# /dev/mapper/root /sysroot

I solve it using:

# xfs_repair /dev/mapper/root

Then the system complete the seven phases and then y reboot using

# ./shutdown

And then the virtual machine centos 7 work well…

Regards

Note: maybe your /dev/mapper/root has an other name, please watch your error log with journalctl -xe to find the name of your unit bad mounted