What is a (raid-controller-) BBU for?
I am wondering what's the purpose of a BBU. My first understanding was, that it enables the cache to write the data to the disc during a power failure. But some specifications say that a BBU can hold its data for up to 72h. I'd expect the data to be written to the disc within milliseconds (given, that the disc still has power, too).
So should a BBU not just protect the cache, but the whole disc for some seconds, too? Wouldn't that be even more secure, because the cache data is written to the disc instead of being around in the cache and waiting for power again? After a second or so, the disc could be shut down.
It doesn't power the disks, it just keeps the data in the cache for (in this case) up to 72 hours until you bring the machine back on line. When you power the machine back up it will write the contents of the cache back out to the disks.
All it does is protect against a power failure. If (for some reason) the machine loses power without cleanly flushing the data out to disk the battery keeps the cache contents alive until you can restart the machine.
It is not a UPS for disks, as the disks could be in an external disk array, or even on a different power circuit. Even a UPS could fail.
It works like this:
Most operating systems have a system call that allows a so-called "synchronous write". This means that during a write operation, if a write has completed then it's guaranteed that it was committed to disk.
Synchronous write is therefore non-cached. It blocks the application until it has completed. This kind of operation is obviously slower than cached write which keeps data in OS memory until disk is idle enough and then writes the data.
Some critical software, such as database software, perform synchronous writes for critical data because a half-written update in case of a power loss can be detrimental to the database integrity.
RAID controllers are notoriously slow with RAID-5 writes so this becomes a problem if your application software uses a lot of synchronous writes. For this reason, RAID-5 controllers are equipped with their own caches.
What the RAID controller does is it writes the data to its cache instead and LIES to the OS, telling it that it committed the data to disk whereas the data is actually still in RAID cache.
But what if power was lost while the data was still in RAID controller's buffer? You'd have a half-written and probably inconsistent data on your disks.
You may say that this behaviour defeats the purpose of a synchronous write... if it was ok to have a cached write then the app software wouldn't ask for a sync write in the first place.
The compromise is this: RAID controller still lies to the OS that it committed the data to disk, but to protect this critical data in case of a power failure, RAID controller has a battery that keeps the cache alive for some time until power can be restored.
So after the power comes back and the disks spin up and initialize, the controller still has that data in its cache thanks to the battery and can finish writing your transaction to disk.
Everyone's happy.
This is why RAID controllers usually won't let you enable write cache unless you have a functional and charged battery unit.
It's worth mentioning that some newer disk controllers now come with high-speed-flash cache that retains the data for far longer than the typical 72 hours, it is often quite a lot larger too (~1GB). If you need part details let me know.
Think of that BBU cache as adding a similar level of protection to that afforded by a journaled file system. It's there in order to allow transactions, simple writes in this case, to be completed if they are interrupted by a power failure. Once power drops the controller cannot continue to write, as that would result in completely unpredictable results. Instead, it holds the data as long as it can and will finish writing it if/when power resumes. What it does not do is act like a UPS for the drives.