Prevent data corruption on ext4/Linux drive on power loss

The write cache has usually nothing to do with the BIOS, mostly there is no option for switching disk cache settings in there. With linux, using hdparm -W 0 should help.

The setting is persistent, so if you don't have hdparm to play around with in your production systems, you should be able to disable the disk write cache on a different system and replug the disk.

BTW: I'd second the idea of a non-writable root filesystem (so your system could boot in a kind of "recovery mode" and allow for remote access even if the writable filesystem is not mountable for some reason). And if you can change the hardware design, consider using mtd devices instead of IDE/SATA disks with a flash-aware filesystem like jffs2. We've been using this combination with several embedded devices (mostly VPN router solutions in the field) for several years with good results.

Update: the root of your problem seems to be that you are running an ext4 filesystem with journaling disabled - has_journal is missing from the Filesystem features list. Just shut down all services, check if anything still has open files using lsof +f -- /, remount your root partition read-only with mount -o remount,ro /, enable the journal with tune2fs -O has_journal /dev/sda1 and set up the "ordered" journal mode as the default mount option using tune2fs -o journal_data_ordered /dev/sda1 - you will have to re-run fsck (preferably from a rescue system) and remount root / reboot after this operation.

With these settings in place, the metadata is guaranteed to be recoverable from the journal even in the event of a sudden power failure. The actual data is also consistently written to disk, although you may see data of several seconds before the power outage lost on bootup. If this is not acceptable, you might consider using the tune2fs -o journal_data /dev/sda1 mount option with your filesystem - this would include all data written to disk in the journal - this obviously would give you better data consistency but at the cost of a performance penalty and a higher wear level on your SSD.


The write cache suggestion is a good start but this sounds like an architectural design flaw. On an embedded system the internal flash should probably NOT be mounted R/W except in rare circumstances. You should really be doing most of the work in a memory filesystem and syncing changes back to the RW flash upon some user command or regular interval. It is really uncommon for an embedded system to use a regular filesystem (like ext4) in rw mode during normal operation. If there is some application requirement where you need lots of storage space you should consider having your system partition be different and designing it such that the data partition can be fsck -y'ed as part of startup.

If you need some starting points I would look at how people setup Diskless Linux systems:

http://frank.harvard.edu/~coldwell/diskless/

and start from there. The general idea is that your system binaries and data can be mounted read-only so your filesystem won't be corrupted. However you need to be able to write to certain areas, so you need something to usually memory filesystem /tmp, /var/tmp. Even if certain things need to be writable you just create a script to mount the partition as r+w and then commit the changes, then go back to read-only.

A really great example of this is the Cyclades hardware, its embedded linux and whenever you make configuration changes you have to execute a save script which actually rebundles the configs and writes them out to the flash.