zfs for Hadoop cloud instead of ext4 [closed]
Solution 1:
From the white paper from Adurant:
The benefits of this configuration include:
- Reduced Hadoop cluster overhead by reducing the replication factor to 2x
- Reduced storage (disk space) requirements by reducing the replication factor to 2x
- Increased the number of copies of data to 4x via the ZFS Storage Appliance
- Added data compression via the ZFS Storage Appliance o Further reducing storage space requirements even in a mirrored pool configuration
- Added read and write caching via the ZFS Storage Appliance decreasing I/O response times
- Added data protection (RAID 1) with no added overhead to the Hadoop cluster
- Added fault tolerance via the ZFS Storage Appliance’s clustered heads
And the results:
The findings of the Hadoop ZFS Proof of Concept testing clearly indicate that the ZFS Storage Appliance is more than able to handle current Hadoop workloads. Data processing was CPU bound, memory utilization was nominal, I/O utilization was nominal, and data was compressed by a minimum of 3.5x.
Of course, things like compression efficiency depend largely on your data, and performance is not only dependent on design, but also on the actual hardware. The document also gives a rundown of the setup. You could replicate it in a smaller way with less nodes and a portion of your real data and run your own benchmarks.