Linux: Force fsck of a read-only mounted filesystem?

I'm developing for a headless embedded appliance, running CentOS 6.2. The user can connect a keyboard, but not a monitor, and a serial console would require opening the case, something we don't want the user to have to do. This all pretty much obviates the possibility of using a recovery USB drive to boot from, unless all it does is blindly reimage the harddrive. I would like to provide some recovery facilities, and I have written a tool that comes up on /dev/tty1 in place of getty to provide these functions.

One such function is fsck. I have found out how to remount the root and other file systems read-only. Now that they are read-only, it should be safe to fsck them and then reboot. Unfortunately, fsck complains to me that the filesystems are mounted and refuses to do anything.

How can I force fsck to run on a read-only mounted partition?

Based on my research, this is going to have to be something obscure. "-f" just means to force repair of a clean (but unmounted) partition. I need to repair a clean or unclean mounted partition. From what I read, this is something "only experts" should do, but no one has bothered to explain how the experts do it. I'm hoping someone can reveal this to me.

BTW, I've noticed that e2fsck 1.42.4 on Gentoo will let you fsck a mounted partition, even mounted read-write, but it seems only to do so if fsck is run from a terminal, so it can ask the user if they're sure they want to do something so dangerous. I'm not sure if the CentOS version does the same thing, but it appears that fsck CAN repair a mounted partition, but it flatly refuses to when not run from a terminal.

One last-resort option is for me to compile my own hacked fsck. But I'm afraid I'll mess it up in some unexpected way.

Thanks!

Note: Originally posted here.

Update: I didn't think it would matter at the time I wrote this, but in order to remount the fs read-only, I had to do this:

echo s > /proc/sysrq-trigger
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger

That was the only way I could find to do this. Everything else complained about the file system being busy. As far as I know, this is 'safe', but it probably remounts a bit differently from the usual approach. And this may be a reason why fsck doesn't want to repair it. It still thinks it's mounted read-write.


You can fsck a read-only filesystem, because mounting read-only doesn't mark it as "dirty" the way read-write mounting does. There are no changes sitting in a write cache that might be only partially flushed to disk, so all the on-disk structures are consistent and safe for fsck to modify.

However, if fsck makes any changes, the kernel's filesystem driver might become confused, because things that it expected to remain constant have instead changed out from under it. This won't affect the integrity of the filesystem itself (since the driver isn't writing to it), but it may make the running system unstable. To avoid that, you should reboot if fsck made any changes to your filesystem.


Having been on an "appliance" type project in the past, I've done a few things which partially work around this sort of problem.

One appliance had enough memory, so the root filesystem ran directly from initrd. The initrd had enough to fsck (force), then mount "/mounts/persistent" and "/mount/static"; almost all files needed after that were on one of these two filesystems.

This had the advantage that the root filesystem never needed "fixing" - if anything went wrong, it would reboot, and the initrd came up clean (since the one being used was not the one on disk). Any updates to the initrd were just put in place (with previous ones being available, for booting); any files not on the original "static" needed after a "firmware upgrade" (=new initrd) went on the initrd from then on. The "static" filesystem was read-only in any case. Only the persistent filesystem needed to be backed up, and the "current firmware version". I had copies of all the firmwares before they were sent out.