Best way to 'harden' embedded ext4 file server against unexpected loss of power?

First, a little background: my company makes an audio streaming device that is a headless, rack-mounted Linux box with a Solid State e-SATA drive attached. The drive is formatted with ext4. The users can connect to the system using Samba/CIFS to upload new audio files or access existing ones. There is also custom software for streaming out audio over the network.

This is all fine. The only problem is that the users are audio people, not computer people, and see the system as a 'black box', not as a computer. Which means that at the end of the day, they aren't going to ssh in to the box and enter "/sbin/shutdown -h"; they are just going to cut power to the rack and leave, and expect things to still work properly the next day.

Since ext4 has journalling, journal checksumming, etc, this mostly works. The only time it doesn't work is when someone uploads a new file via Samba and then cuts power to the system before the uploaded data has been fully flushed to the disk. In that case, they come in the next day and find that their new file has been truncated or is missing entirely, and are unhappy.

My question is, what is the best way to avoid this problem? Is there a way to get smbd to call "sync" at the end of every upload? (Performance on uploads isn't so important, since they only happen occasionally). Or is there a way to tell ext4 to automatically flush within a few seconds of any change to a file? (Again, performance can be sacrificed for safety here) Should I set a particular write-ordering mode, activate barriers, etc?


Mounting the filesystem with sync specified in fstab would probably help. I suspect someone will have a recommendation better suited for your particular application.

I begun initial research on filesystems used with flash storage, as I want to custom-build a home theater PC as an appliance. You may find a different storage solution better suited for your device. Unfortunately, I have yet to find something I prefer so I do not have a detailed recommendation there.

Edit 1

According to the smb.conf(5) manpage, it supports immediate syncing within SAMBA:

   strict sync (S)
          Many Windows applications (including the Windows 98
          explorer  shell)  seem  to  confuse flushing buffer
          contents to disk with doing a sync to  disk.  Under
          UNIX,  a  sync  call  forces the process to be sus-
          pended until the kernel has ensured that  all  out-
          standing  data  in  kernel  disk  buffers  has been
          safely stored onto stable  storage.  This  is  very
          slow  and  should only be done rarely. Setting this
          parameter to no (the default)  means  that  smbd(8)
          ignores  the  Windows  applications  requests for a
          sync call. There is only a  possibility  of  losing
          data  if  the operating system itself that Samba is
          running on crashes, so there is  little  danger  in
          this  default setting. In addition, this fixes many
          performance problems that people have reported with
          the new Windows98 explorer shell file copies.

          Default: strict sync = no

   sync always (S)
          This  is  a boolean parameter that controls whether
          writes will always be  written  to  stable  storage
          before  the  write call returns. If this is no then
          the server will be guided by the  client's  request
          in  each write call (clients can set a bit indicat-
          ing that a particular write should be synchronous).
          If this is yes then every write will be followed by
          a fsync()  call to ensure the data  is  written  to
          disk.  Note  that the strict sync parameter must be
          set to yes in order for this parameter to have  any
          affect.

          Default: sync always = no

Right, I've been working with the same kinda of problem. If you disable any kind of write caching in the system any data will be written to disk as soon as possible.

You will lose performance, but you will get better data integrity.

The difference between data on the disk, and what the operating system thinks is on the disk (but it actually caches in memory) will be significantly lower.

If you can't use a UPS for the solution, or some hardware solution that gracefully shuts the machine down if power is lost from AC, then you're going to have to use hacks like this.

It may be an idea to use a much simpler filesystem for storing the media, and booting the operating system from a ramdisk. Thus avoiding the chance to corrupt the boot/root partition of the machine.

So to recap,

Mount the file system with sync, you will lose performance, however all writes will not be cached.

Turn off hardware disk write caches, again you will lose performance.

This article should be of interest to you

http://sr5tech.com/write_back_cache_experiments.htm


Since you mention that your company builds these, I'd recommend looking at a hardware angle. I've seen servers with battery backups on the drive controllers to allow cached data to survive a loss of power. What if your engineers built in a small battery to keep the system running long enough to shutdown clean? It would not need to be a big, separate UPS, it could be internal, and set to shutdown the system as soon as A/C power was lost. It might add a few dollars to the cost, but it could also be a marketing point.