hardware recommendations for a DIY storage system based on ZFS [closed]

Looking to make use of old server lying idle as a proof-of-concept ...here are specs Dell PE 2900: Xeon 5110 - 2P, 12 GB RAM, 8x 300 GB 15K drives, Perc 5i+256 MB cache

what additional h/w would be needed on the server and hosts? 1GB ethernet card, 1GB Switch?

there are 4 ESX servers which may connect to this storage server (iSCSI or NFS)

what is software is recommended? Opensolaris? Nexenta community edition? FreeNAS?

appreciate any links to guides, tutorials.

Maruti


Solution 1:

Install a recent development build of OpenSolaris (b134).

If you want performance, create 4 mirrored vdev's with those eight disks you have.

For even better performance, use two mirrored SLC SSDs as a log device and an additional SSD as cache.

Solution 2:

For ZFS, there are number of factors which contribute to the overall cost, performance and your satisfaction with the system you've built.

SUPPORTABILITY If you need to be able to call someone when you have problems don't DIY, buy a Sun 7000 Unified Storage appliance. They're a little pricey, but you get what you pay for. High quality hardware, with recent OpenSolaris code in an appliance form...oh and Analytics to die for. It's the only way you can buy OpenSolaris support from Oracle and you've got relatively deep pockets talk to your Oracle rep it might be worth it. (it was for me at work)

SOFTWARE Since Solaris 10 doesn't have the cool cutting edge ZFS features (dedup, non-mirrored ZIL, COMSTAR iSCSI/FibreChannel target, etc) you're gonna want to run something based on the OpenSolaris bits. Since OpenSolaris itself is dead and there isn't a full distribution around Illumos yet, consider Nexenta. It's basically OpenSolaris Kernel + Debian userland (apt). Nexenta Core Platform is free for unlimited use, but if you're willing to pay for support, consider NexentaStor although I'm not a fan of $$ per TB (perpetual licenses start at $800 + $75/TB).

MIRRORED vs RAIDZ1/RAIDZ2 Basically a struggle between IOPS and capacity given the same number of drives. With big disks (1-2TB) if you decide mirroring is too expensive, definitely go with double parity (RAIDZ2) as rebuild times with Multi-TB arrays can easily be longer than a day. (More: ZFS: Mirror vs. RAID-Z). Don't forget redundancy != backups.

DRIVES I recommend you think about breaking your storage out from your server enclosure. SuperMicro make some nice cases, but inevitably you're going to want more storage than fits in your case, why not start with a decent SAS enclosure and buy another when you expand. I'd buy 7200RPM SATA drives over 10k-15k SAS drives, more or mirrored spindles will out-perform fast expensive disks with ZFS for the same $$.

Memory Buy lots of ram. 12-16GB minimum, double/triple that if you want to consider dedup.

SSDs If you're using iSCSI or NFS for virtual machine storage, definitely buy a high end device for ZIL to speed up synchronous writes (see: my answer to a previous question). Buy one/multiple decent MLC SSDs for L2ARC to act as a secondary read cache; if you're doing dedup you'll want SSDs for L2ARC big enough to fit your deduplication tables.

PROVISIONING ZFS makes thin provisioning of a filesystem as simple as creating a directory in most environments. zfs create -V 40g pool/fsname then zfs set shareiscsi=on pool/fsname and you're done. Cloning an existing system as is similarly as easy with a snapshot 'zfs snapshot pool/fsname@snapname; zfs clone pool/fsname@snapname pool/newfsname'. These operations are quick (0 - 5secs).

Update 7/10/2010 to reflect recommendations for how to use your hardware:

Since the Perc6 doesn't support passing the disks directly as just a bunch of disks (discussion), you'll have to create 8 single disk RAID 0 arrays. Use two as mirrored pair and install your root volume there. Use the remaining six as a striped set of 3 mirrored pairs (think RAID10) after 1st boot by running zpool create poolName mirror c0t0d0 c0t1d0 mirror c0t2d0 c0t3d0 mirror c0t4d0 c0t5d0 (substitute your diskid by looking at the output of the 'format' command). Note: Since the PERC may renumber if a failed disk (and thus associated RAID0 set) is missing after reboot, you should note drive serials/cXtXdX/slots and document/label accordingly. Hopefully you won't ever need it, but having that info means makes it less painful should you ever have to migrate the disks or good forbid perform recovery.

Before the Oracle acquisition I would've definitely recommended OpenSolaris over Nexenta Core Platform, but now I'd definitley lean towards Nexenta CP. They are basically the only folks continuing regular updates since OpenSolaris b134 was released in March 2010. Migrating your ZFS pool between is possible, but depends only on ZFS on disk version, which you can specify at pool creation time (discussion, see 3rd msg). I've never used FreeNAS or EON, so can't comment on them.

As for NFS vs COMSTAR iSCSI, you should test both over gigabit with jumbo frames. AFAIK, OpenSolaris/Nexenta don't support hardware TOE for NICs, but if you've got the TOE enabled NICs on the VMWare side they will reduce CPU overhead for iSCSI. You can test with direct cabled crossovers but for multi-host you'll want a Gigabit switch that supports jumbo frames (optimally a iSCSI optimized VLAN on a Layer3 switch). If you've got a Fibre card test COMSTAR Fibre Channel targets too.

To leverage hybrid storage capabilities of ZFS (HDD + SSD), I'd simulate your usage without a dedicate ZIL device and see if performance is good enough (striped/mirrored 15k SAS disks might be enough). If not, with one/multiple NON PRODUCTION VMs setup, temporarily disable the ZIL and measure performance again. If your performance is much better, then the ZIL is bottleneck for your setup and a dedicated ZIL device would be worth the money. The DDRDrive X1 ($2000, $1500 .edu) is designed for ZIL uses just a PCI-E x1 slot instead of drive bay. Alternatively you could consider replacing your mirrored boot disks with two non redundant 2.5inch SATA SSDs. A super-capacitor backed SSD dedicated for ZIL use (Vertex2Pro 32GB $435) and a decent MLC SSD (like the Intel X25-M 80GB $230) split with one small partition for root and the rest for L2ARC. More RAM is well used by ZFS ARC, but 12GB should be enough to start.

I'll leave suggestions for benchmarking tools to another question (heavily dependent on your storage->vm path, guest OSes and workload) but DTrace probes can yield a lot of useful data despite the learning curve (this is where the Sun 7000 Series Analytics shines). Two final notes, update your PERC6 firmware & BIOS before starting and if you get an SSD for L2ARC, it can take hours to get hot so don't just bench it cold.

Solution 3:

Try this Recipe from SUN... ahh.. Oracle:

http://developers.sun.com/openstorage/articles/opensolaris_storage_server.html