Managing battery relearn cycles on LSI and similar RAID controllers
How do engineers deal with RAID controller battery "relearn" cycles?
As noted in: What's a "battery relearn" on a LSI MegaRaid?, relearn cycle discharges the RAID controller battery (BBWC or BBU), thus removing the write cache acceleration. The battery's life is checked and once charged, the write cache is reenabled. This has an obvious impact on server I/O performance for the duration of the relearn cycle. I think this occurs monthly.
The performance degradation has been noted, especially on database systems:
Slow database? Check RAID battery!
Relearn about your battery
My background is in HP ProLiant servers, whose Smart Array controllers do not go through this exercise (or at least have more proactive battery life monitoring). This seems to be a terrible feature (maximum inconvenience, little gain), but I'm in an environment with many LSI controllers (on Supermicro hardware) and would like to see if a blanket policy can be applied to the relevant systems.
- What is the default schedule of the relearn cycle on an LSI controller?
- Are these relearn cycles useful?
- Should this feature be disabled?
- If you choose to leave this feature enabled in your environment, how do you handle scheduling? Do you schedule this manually or allow the controller to set its own schedule?
- Are Dell Perc controllers affected in the same manner? (LSI is the OEM)
Solution 1:
Just recently I read an article by one of Godaddy's engineers about this very topic: Learning to Deal with Learning
On their hardware (Dell PERC cards) battery learning cycle happens every 90 days, but no way to know when exactly it'll happen, ie during peak or off-peak hours.
They talked about different solutions:
-
Outright disable Battery Learning. Problem with this option is that you won't know the status of your battery and how long and how much it can hold charge, so in the case of outage you can risk data loss.
-
Use different hardware. Some controllers have 2 batteries and flip between them during such learning cycles. Additionally, there are RAID controllers(such as Dell H710) that do not need batteries but instead use non-volatile NVRAM to store uncommitted data.
-
Force write-back(caching) regardless of the status of your batteries. Like the 1st solution, you are risking data loss.
Ultimately, they setup crons for off-peak hours that monitor for the next learn cycle and if it is within the next 24 hours, they force it to happen immediately. That way they keep the benefit of exercising batteries yet without running it at peak-usage times.