How can I optimise ext4 for reliability?

Solution 1:

No. You can never suppose something to be 100% reliable.

Journaling file systems minimise data loss in the event of an unexpected outage. Extents and barriers help even more, but cannot eliminate all associated problems. Personally, I've never experienced data loss because of file system corruption when using journaling file systems.

Also, journaling is not disabled by default.

Here's a good overview of ext4 and its improvements: http://kernelnewbies.org/Ext4

Solution 2:

A new feature added to ext4 and introduced with kernel 3.5 is what is known as 'metadata checksums', which is another feature of ext4 that is supposed to improve the reliability and the integrity of the structure of the file system.

The overall implementation is well explained at Kernel newbies:

Modern filesystems such as ZFS and Btrfs have proved that ensuring the integrity of the filesystem using checksums is a valuable feature. Ext4 has added the ability to store checksums of various metadata fields. Every time a metadata field is read, the checksum of the read data is compared with the stored checksums, if they are different it means that the medata is corrupted (note that this feature doesn't cover data, only the internal metadata structures, and it doesn't have "self-healing" capabilities).

Any ext4 filesystem can be upgraded to use checksums using the "tune2fs -O metadata_csum" command, or "mkfs -O metadata_csum" at creation time. Once this feature is enabled in a filesystem, older kernels with no checksum support will only be able to mount it in read-only mode.

Articles such as this one at kernel.org discuss further in great technical detail how using metadata checksums can prevent corrupted metadata from damaging the file system structure.

However the article also warns that:

The metadata checksumming code started going into mainline in Linux 3.5, and as of 3.7-rc1 it is undergoing some user testing. This code is not yet rock solid.

It is not enabled by default in Ubuntu 12.10, and is probably best not to enable it for the moment after the recent issues with the ext4 filesystem, as noted here.

Solution 3:

You could disable delayed allocation under ext4 (nodelalloc), that would make it significantly more likely that you would recover more data if/when you did suffer a power out during a write, but it would come at the cost of more fragmentation of the file system over time.