Horror stories with Sun ZFS? [closed]

My companny is considering purchasing two 7320's in a cluster. Anybody have anything good or bad to say about the platform?


Solution 1:

The 7000 series gave us lots of problems in the beginning. We had failover not working, remote replication failing with >1TB jobs and the web interface locked a lot. This was on a 7410 cluster with the 2009.x firmware versions. We even took it out of production and made the case with Sun at the time that we were seriously considering dumping it and never looking back.

Enter the 2010.Q3 firmware release and a more proactive/responsive Support team. We discovered our heads had unmatched HW firmware versions (not the Fishworks firmware) that were causing the failovers to fail. With the 2010.Q3 upgrade remote replications is much more robust (I suggest you do it by dataset rather than by project as it increases the parallelism). The web interface has never locked on us again (it used to stay locked for as long as a background task was running). There was also a problem with removing snapshots that exists until today in OpenSolaris and was fixed with 2010.Q3... the appliance doesn't stop responding anymore (but our OpenSolaris boxes do). I was told by engineering this required a major overhaul of how ZFS treats the I/O operations and a new scheduling class.

Performance wise, the appliances have never been a problem. As stated above, our main problems were with the software and they were fixed. Communication with Oracle from the on is just to request new features and add-ons. If you need an appliance (as opposed to managing the whole thing from the command line), I think it's fair to consider it as an option. The web interface with the Analytics capability is the main selling point (it's like DTrace for Dummies, very easy to get the information quickly).

When bying these boxes we analyzed the price/performance of EMC and Netapp at the time, and the ZFS appliances were much cheaper while providing a environment that we know well (ZFS).

My suggestion is to ask Oracle to let you trial something (at their labs if you will) and throw your workload at the boxes to see how them respond. Make it a contract clause that specifies exactly what you are expecting.

So that's our horror story with it. It seems over now (for whatever we need at the moment) but nothing is perfect. Although we still don't consider it our main choice of storage (we try to have many and we're cheap as hell), it's not the no-no it was before.

Solution 2:

I'd say you could do better with a pre-configured or certified NexentaStor Enterprise system, given the shaky relationships caused by the Oracle purchase. Nexenta seems to have all of the momentum since Oracle killed the OpenSolaris solution.

I'd at least compare the price points of the 7320's to something like the PogoLinux Storage Director Z2.

I own a Sun x4540 "Thumper/Thor" running OpenSolaris that's been in production since 3/2009, and while I haven't had any ZFS failures, the Sun hardware has fallen below my expectations. Poor support for LED indicators/monitoring/alerts, an uncertain support arrangement once Oracle came into the picture and expense. The hardware is still in good shape, and I've probably replaced 3-4 disks in that time. The OS is stable-enough for its purpose (VMWare storage), but I've taken a different approach for new ZFS builds...

I'm building new ZFS-based filers for SAN storage to back VMware and my firm's storage environment. The platform is attractive because of inline compression, deduplication and the ability to present iSCSI storage as well as NFS. I am using Nexenta Enterprise and Community Edition, depending on the application. For this, I'm using HP ProLiant DL180 G6 storage nodes (essentially the same as Lefthand's storage nodes) and outfitting them with 24GB-48GB RAM, LSI 9211 SAS controllers to replace the Smart Array RAID controllers, and a mix of solid-state (cache), 15k RPM and low-speed 7.2k RPM SAS disks. The NICs are 10-gigabit. We're an HP shop and I know the hardware and support requirements, and Nexenta works well with the hardware (drive LEDs, HP agents, etc.) Using this solution, I'm at $5500-$8000 per storage node, depending on drive type.

Also see the popular Anandtech article for general platform information.

http://www.anandtech.com/show/3963/zfs-building-testing-and-benchmarking