How effective is LSI CacheCade SSD storage tiering?
LSI offers their CacheCade storage tiering technology, which allows SSD devices to be used as read and write caches to augment traditional RAID arrays.
Other vendors have adopted similar technologies; HP SmartArray controllers have their SmartCache. Adaptec has MaxCache... Not to mention a number of software-based acceleration tools (sTec EnhanceIO, Velobit, FusionIO ioTurbine, Intel CAS, Facebook flashcache?).
Coming from a ZFS background, I make use of different types of SSDs to handle read caching (L2ARC) and write caching (ZIL) duties. Different traits are needed for their respective workloads; Low-latency and endurance for write caching. High capacity for read.
- Since CacheCade SSDs can be used for write and read cache, what purpose does the RAID controller's onboard NVRAM play?
- When used as a write cache, what danger is there to the CacheCade SSDs in terms of write endurance? Using consumer SSDs seems to be encouraged.
- Do writes go straight to SSD or do they hit the controller's cache first?
- How intelligent is the read caching algorithm? I understand how the ZFS ARC and L2ARC functions. Is there any insight into the CacheCade tiering process?
- What metrics exist to monitor the effectiveness of the CacheCade setup? Is there a method to observe a cache hit ratio or percentage? How can you tell if it's really working?
I'm interested in opinions and feedback on the LSI solution. Any caveats? Tips?
Since CacheCade SSDs can be used for write and read cache, what purpose does the RAID controller's onboard NVRAM play?
If you leave the write caching feature of the controller enabled, the NVRAM will still be used primarily. The SSD write cache will typically only be used for larger quantities of write data, where the NVRAM alone is not enough to keep up.
When used as a write cache, what danger is there to the CacheCade SSDs in terms of write endurance? Using consumer SSDs seems to be encouraged.
This depends on how often your writes are actually causing the SSD write cache to become necessary... whether or not your drives are able to handle the write load quickly enough that the NVRAM doesn't fill up. In most scenarios I've seen, the write cache gets little to no action most of the time, so I wouldn't expect this to have a big impact on write endurance - most writes to the SSDs are likely to be part of your read caching.
Do writes go straight to SSD or do they hit the controller's cache first?
Answered above... Controller cache is hit first, SSD cache is more of a 2nd line of defense.
How intelligent is the read caching algorithm? I understand how the ZFS ARC and L2ARC functions. Is there any insight into the CacheCade tiering process?
Sorry... no knowledge to contribute on that - hopefully someone else will have some insight?
What metrics exist to monitor the effectiveness of the CacheCade setup? Is there a method to observe a cache hit ratio or percentage? How can you tell if it's working?
It doesn't look like any monitoring tools are available for this as there are with other SAN implementations of this feature set... And since the CacheCade virtual disk doesn't get presented to the OS, you may not have any way to manually monitor activity either. This may just require further testing to verify effectiveness...
Opinion/observation: In a lot of cases (when used correctly, read cache appropriately sized for the working data set) this feature makes things FLY. But in the end, it can be hit-and-miss.
Speaking about hardware solutions I found no way to know exact hit ratio or something. I believe there are 2 reasons for that: the volume behind controller appears as a single drive (and so it should "just work"), and it is hard to count "hits" which will be not for files but rather for HDD sectors so there'll may be some hit rate even on empty HDD which may be confusing. Moreover algorithms behind "hybridisation" are non-public so knowing hitrate won't help much. You just buy it and put it to work - low spendings (compared to pure SSD solution), nice speed impact.
"Buy it and use it" approach is pretty good thing to consider, but the fact is noone knows for sure how to build the fastest combination: should we use several big HDD and several big cache SSD, or should we use many small HDD and several big SSD etc., and what's the difference between 100 or, say, 500 Gb or 2000Gb of SSD cache (even 500 looks overkill if volume hot data are small-sized), and should it be like 2x64Gb or 8x8Gb to have data transfer paralleled. Again, each vendor uses its own algorithm and may change it on next firmware update.
I write this mostly to say that my findings given me strange answer: if you use some general-purpose and general-load-profiled server, then h/w hybrid controller is fine even with relatively small SSDs, but if your tasks used to be specific you'd better go for s/w solution (which you'll be able to choose since you're the only one who knows the load profile) or for some high-priced PCI-card storages.