ZFS Data Loss Scenarios
I'm looking toward building a largish ZFS Pool (150TB+), and I'd like to hear people experiences about data loss scenarios due to failed hardware, in particular, distinguishing between instances where just some data is lost vs. the whole filesystem (of if there even is such a distinction in ZFS).
For example: let's say a vdev is lost due to a failure like an external drive enclosure losing power, or a controller card failing. From what I've read the pool should go into a faulted mode, but if the vdev is returned the pool should recover? or not? or if the vdev is partially damaged, does one lose the whole pool, some files, etc.?
What happens if a ZIL device fails? Or just one of several ZILs?
Truly any and all anecdotes or hypothetical scenarios backed by deep technical knowledge are appreciated!
Thanks!
Update:
We're doing this on the cheap since we are a small business (9 people or so) but we generate a fair amount of imaging data.
The data is mostly smallish files, by my count about 500k files per TB.
The data is important but not uber-critical. We are planning to use the ZFS pool to mirror 48TB "live" data array (in use for 3 years or so), and use the the rest of the storage for 'archived' data.
The pool will be shared using NFS.
The rack is supposedly on a building backup generator line, and we have two APC UPSes capable of powering the rack at full load for 5 mins or so.
Design the right way and you'll minimize the chances of data loss of ZFS. You haven't explained what you're storing on the pool, though. In my applications, it's mostly serving VMWare VMDK's and exporting zvols over iSCSI. 150TB isn't a trivial amount, so I would lean on a professional for scaling advice.
I've never lost data with ZFS.
I have experienced everything else:
- a dozen SSD failures (some in L2ARC duty)
- multiple failed pool disks
- unpredictable SATA drive errors requiring eventual replacement with nearline SAS disks
- fallout from misconfigured deduplication efforts
- recovery of corrupted or faulted zpools from safe mode
- bad 10GbE NIC ports/cabling
- frequent OS crashes
- a lightning strike...
But through all of that, there was never an appreciable loss of data. Just downtime. For the VMWare VMDK's sitting on top of this storage, a fsck or reboot was often necessary following an event, but no worse than any other server crash.
As for a ZIL device loss, that depends on design, what you're storing and your I/O and write patterns. The ZIL devices I use are relatively small (4GB-8GB) and function like a write cache. Some people mirror their ZIL devices. Using the high-end STEC SSD devices makes mirroring cost-prohibitive. I use single DDRDrive PCIe cards instead. Plan for battery/UPS protection and use SSD's or PCIe cards with a super-capacitor backup (similar to RAID controller BBWC and FBWC implementations).
Most of my experience has been on the Solaris/OpenSolaris and NexentaStor side of things. I know people use ZFS on FreeBSD, but I'm not sure how far behind zpool versions and other features are. For pure storage deployments, I'd recommend going the Nexentastor route (and talking to an experienced partner), as it's a purpose-built OS and there are more critical deployments running on Solaris derivatives than FreeBSD.
I accidentally overwrote both ZILs on the last version of OpenSolaris, which caused the entire pool to be irrevocably lost. (Really bad mistake on my part! I didn't understand that losing the ZIL would mean losing the pool. Fortunately recovered from backup with downtime.)
Since version 151a though (don't know offhand how what ZPool version that means), this problem has been fixed. And, I can testify that it works.
Other than that, I've lost ZERO data on a 20tb server - including due to several further cases of user error, multiple power-failures, disk mis-management, mis-configurations, numerous failed disks, etc. Even though the management and configuration interfaces on Solaris change frequently and maddeningly from version to version and presents a significant ever-shifting skills target, it is still the best option for ZFS.
Not only have I not lost data on ZFS (after my terrible mistake), but it constantly protects me. I no longer experience data corruption - which has plagued me for the last 20 years on any number of servers and workstations, with what I do. Silent (or just "pretty quiet") data corruption has killed me numerous times, when the data rolls off the backup rotation, but has in fact become corrupt on-disk. Or other scenarios where the backups backed up the corrupt versions. This has been a far bigger problem than just losing data in a big way all at once, which is almost always backed up anyway. For this reason, I just love ZFS and can't comprehend why checksumming and automatic healing haven't been standard features in file systems for a decade. (Granted, truly life-or-death systems usually have other ways of insuring integrity, but still - enterprise data integrity is important too!)
Word to the wise, if you don't want to descend into ACL-hell, don't use the CIFS server built-in to ZFS. Use Samba. (You said you use NFS though.)
I disagree with the SAS vs. SATA argument, at least the suggestion that SAS is always preferred over SATA, for ZFS. I don't know if that comment[s] was referencing platter rotation speed, presumed reliability, interface speed, or some other attribute[s]. (Or maybe just "they cost more and are generally not used by consumers, therefore they are superior". A recently released industry survey (still in the news I'm sure), revealed that SATA actually outlives SAS on average, at least with the survey's significant sample size. (Shocked me that's for sure.) I can't recall if that was "enterprise" versions of SATA, or consumer, or what speeds - but in my considerable experience, enterprise and consumer models fail at the same statistically significant rates. (There is the problem of consumer drives taking too long to time-out on failure though, which is definitely important in the enterprise - but that hasn't bitten me, and I think it is more relevant to hardware controllers that could take the entire volume off-line in such cases. But that's not a SAS vs SATA issue, and ZFS has never failed me over it. As a result of that experience, I now use a mix of 1/3 enterprise and 2/3 consumer SATA drives.) Furthermore I've seen no significant performance hit with this mix of SATA, when configured properly (e.g. a stripe of three-way mirrors), but then again I have a low IOPS demand, so depending on how large your shop is and typical use-cases, YMMV. I've definitely noticed that per-disk built-in cache size matters more for latency issues than platter rotational speed, in my use-cases.
In other words, it's an envelope with multiple parameters: cost, throughput, IOPS, type of data, number of users, administrative bandwidth, and common use-cases. To say that SAS is always the right solution is to disregard a large universe of permutations of those factors.
But either way, ZFS absolutely rocks.