Recently I've seen the root filesystem of a machine in a remote datacenter get remounted read-only, as a result of consistency issues.

On reboot, this error was shown:

UNEXPECTED INCONSISTENCY: RUN fsck MANUALLY (i.e., without -a or -p options)

After running fsck as suggested, and accepting the corrections manually with Y, the errors were corrected and the system is now fine.

Now, I think that it would be interesting if fsck was configured to run and repair everything automatically, since the only alternative in some cases (like this one) is going in person to the remote datacenter and attach a console to the affected machine.

My question is: why does fsck by default ask for manual intervention? How and when a correction performed by such program would be unsafe? Which are the cases when the sysadmin might want to leave a suggested correction aside for some time (to perform some other operations) or abort it alltogether?


Solution 1:

fsck definitely causes more harm than good if the underlying hardware is somehow damaged; bad CPU, bad RAM, a dying hard drive, disk controller gone bad... in those cases more corruption is inevitable.

If in doubt, it's a good idea to just to take an image of the corrupted disk with dd_rescue or some other tool, and then see if you can successfully fix that image. That way you still have the original setup available.

Solution 2:

You have seen one example where fsck worked, but I've seen more then enough damaged file systems where it did not work successfully at all. If it would work fully automatic, you might have no chance to do things like a dd disk dump or something like that which in many cases would be an excellent idea to do before attempting a repair.

It's never, ever a good idea to try something like that automatic at all.

Oh, and modern servers should have remote consoles or at least, independent rescue systems to recover from something like that without lugging a KVM rack to the server.