Best way to 'harden' embedded ext4 file server against unexpected loss of power?
First, a little background: my company makes an audio streaming device that is a headless, rack-mounted Linux box with a Solid State e-SATA drive attached. The drive is formatted with ext4. The users can connect to the system using Samba/CIFS to upload new audio files or access existing ones. There is also custom software for streaming out audio over the network.
This is all fine. The only problem is that the users are audio people, not computer people, and see the system as a 'black box', not as a computer. Which means that at the end of the day, they aren't going to ssh in to the box and enter "/sbin/shutdown -h"; they are just going to cut power to the rack and leave, and expect things to still work properly the next day.
Since ext4 has journalling, journal checksumming, etc, this mostly works. The only time it doesn't work is when someone uploads a new file via Samba and then cuts power to the system before the uploaded data has been fully flushed to the disk. In that case, they come in the next day and find that their new file has been truncated or is missing entirely, and are unhappy.
My question is, what is the best way to avoid this problem? Is there a way to get smbd to call "sync" at the end of every upload? (Performance on uploads isn't so important, since they only happen occasionally). Or is there a way to tell ext4 to automatically flush within a few seconds of any change to a file? (Again, performance can be sacrificed for safety here) Should I set a particular write-ordering mode, activate barriers, etc?
Mounting the filesystem with sync
specified in fstab would probably help. I suspect someone will have a recommendation better suited for your particular application.
I begun initial research on filesystems used with flash storage, as I want to custom-build a home theater PC as an appliance. You may find a different storage solution better suited for your device. Unfortunately, I have yet to find something I prefer so I do not have a detailed recommendation there.
Edit 1
According to the smb.conf(5) manpage, it supports immediate syncing within SAMBA:
strict sync (S)
Many Windows applications (including the Windows 98
explorer shell) seem to confuse flushing buffer
contents to disk with doing a sync to disk. Under
UNIX, a sync call forces the process to be sus-
pended until the kernel has ensured that all out-
standing data in kernel disk buffers has been
safely stored onto stable storage. This is very
slow and should only be done rarely. Setting this
parameter to no (the default) means that smbd(8)
ignores the Windows applications requests for a
sync call. There is only a possibility of losing
data if the operating system itself that Samba is
running on crashes, so there is little danger in
this default setting. In addition, this fixes many
performance problems that people have reported with
the new Windows98 explorer shell file copies.
Default: strict sync = no
sync always (S)
This is a boolean parameter that controls whether
writes will always be written to stable storage
before the write call returns. If this is no then
the server will be guided by the client's request
in each write call (clients can set a bit indicat-
ing that a particular write should be synchronous).
If this is yes then every write will be followed by
a fsync() call to ensure the data is written to
disk. Note that the strict sync parameter must be
set to yes in order for this parameter to have any
affect.
Default: sync always = no
Right, I've been working with the same kinda of problem. If you disable any kind of write caching in the system any data will be written to disk as soon as possible.
You will lose performance, but you will get better data integrity.
The difference between data on the disk, and what the operating system thinks is on the disk (but it actually caches in memory) will be significantly lower.
If you can't use a UPS for the solution, or some hardware solution that gracefully shuts the machine down if power is lost from AC, then you're going to have to use hacks like this.
It may be an idea to use a much simpler filesystem for storing the media, and booting the operating system from a ramdisk. Thus avoiding the chance to corrupt the boot/root partition of the machine.
So to recap,
Mount the file system with sync, you will lose performance, however all writes will not be cached.
Turn off hardware disk write caches, again you will lose performance.
This article should be of interest to you
http://sr5tech.com/write_back_cache_experiments.htm
Since you mention that your company builds these, I'd recommend looking at a hardware angle. I've seen servers with battery backups on the drive controllers to allow cached data to survive a loss of power. What if your engineers built in a small battery to keep the system running long enough to shutdown clean? It would not need to be a big, separate UPS, it could be internal, and set to shutdown the system as soon as A/C power was lost. It might add a few dollars to the cost, but it could also be a marketing point.