What's the best way to explain storage issues to developers and other users

When server storage gets low developers all start to moan, "I can get a 1 TB drive at Walmart for 100 bucks, what's the problem".

How can the complexities of storage be explained to developers so that they will understand why a 1 TB drive from Walmart just won't work.

p.s. I'm a developer and want to know too: )


Solution 1:

Some home truths about storage, or why is enterprise storage so f-ing expensive?

Consumer hard drives offer large volumes of space so that even the most discerning user of *cough* streaming media *cough* can buy enough to store a collection of several terabytes. In fact, disk capacity has been growing faster than the transistor counts on silicon for a couple of decades now.

'Enterprise' storage is a somewhat more complex issue as the data has performance and integrity requirements that dictate a somewhat more heavyweight approach. The data must have some guarantee of availability in the event of hardware failures and it may have to be shared with a large number of users, which will generate many more read/write requests than a single user.

The technical solutions to this problem can be many, many times more expensive per gigabyte than consumer storage solutions. They also require physical maintenance; backups must be taken and often stored off-site so that a fire does not destroy the data. This process adds ongoing costs.

Performance

On your 1TB consumer or even enterprise near-line drive you have just one head. The disk rotates at 7200 RPM, or 120 revolutions per second. This means that you can get at most 120 random-access I/O operations per second in theory* and somewhat less in practice. Thus, copying a large file on a single 1TB volume is relatively slow.

On a disk array with 14x 72GB disks, you have 14 heads over disks going at (say) 15,000 RPM or approximately 250 revolutions per second. This gives you a theoretical maximum of 3,500 random I/O operations per second* (again, somewhat less in practice). All other things being equal a file copy will be many, many times faster.

* You could get more than one random access per revolution of the disk if the geometry of the reads allowed the drive to move the heads and read a sector that happened to be available within one revolution of the disk. If the disk accesses were widely dispersed you will probably average less than one. Where a disk array formatted in a striped (see below) layout you will get a maximum of one stripe read per revolution of the disk in most circumstances and (depending on the RAID controller) possibly less than one on average.

The 7200 RPM 1TB drive will probably be reasonably quick on sequential I/O. Disk arrays formatted in a striped scheme (RAID-0, RAID-5, RAID-10 etc.) can typically read at most one stripe per revolution of the disk. With a 64K stripe we can read 64Kx250 = 16MB or so of data per second off a 15,000 RPM disk. This gives a sequential throughput of around 220MB per second on an array of 14 disks, which is not that much faster on paper than the 150MB/sec or so quoted for a modern 1TB SATA disk.

For video streaming (for example), an array of 4 SATA disks in a RAID-0 with a large stripe size (some RAID controllers will support stripe sizes up to 1MB) have quite a lot of sequential throughput. This example could theoretically stream about 480MB/sec, which is comfortably enough to do real-time uncompressed HD video editing. Thus, owners of Mac Pros and similar hardware can do HD video compositing tasks that would have required a machine with a direct-attach fibre array just a few years ago.

The real benefit of a disk array is on database work which is characterised by large numbers of small, scattered I/O requests. On this type of workload performance is constrained by the physical latency of bits of metal in the disk going round-and-round and back-and-forth. This metric is known as IOPS (I/O operations per second). The more physical disks you have - regardless of capacity - the more IOPS you can theoretically do. More IOPS means more transactions per second.

Data integrity

Additionally most RAID configurations give you some data redundancy - which requires more than one physical disk by definition. The combination of a storage scheme with such redundancy and a larger number of drives gives a system the ability to reliably serve a large transactional workload.

The infrastructure for disk arrays (and SANs in the more extreme case) is not exactly a mass-market item. In addition it is one of the bits that really, really cannot fail. This combination of standard of build and smaller market volumes doesn't come cheap.

Total storage cost including backup

In practice, the largest cost for maintaining 1TB of data is likely to be backup and recovery. A tape drive and 34 sets of SDLT or ultrium tapes for a full grandfather cycle of backup and recovery will probably cost more than a 1TB disk array did. Add the costs of off-site storage and the salary of even a single tape-monkey and suddenly your 1TB of data isn't quite so cheap.

The cost of the disks is often a fair way down the hierarchy of dominant storage costs. At one bank I had occasion to work for SAN storage was costed at £900/GB for a development system and £5,000/GB for a disk on a production server. Even at enterprise vendor prices the physical cost of the disks was only a tiny fraction of that. Another example that I am aware of has a (relatively) modestly configured IBM Shark SAN that cost them somewhere in excess of £1 million. Just the physical storage on this is charged out at around £9/gigabyte, or about £9,000 for space equivalent to your 1TB consumer HDD.

Solution 2:

Just say: "Yeah, and I can get a Java programmer offshore for $5/hour."

Solution 3:

Maybe ask them a few questions about their Walmart drive:

  • what is its mean time to failure?
  • what happen if it fails catastropically?
  • how often is it backed up?
  • how much storage will 12 months of backups require?
  • how can it be backed up off site?
  • how could it be restored? (in whole? a single file? a couple of directories?)
  • how much does it cost to store the backups?
  • how will he guarantee that the backups are kept safe? secure?
  • what insurance does he have to cover the loss of vital data?

... Compare these answers with a drive that's running as part of a RAID 5 array in a well-managed datacentre.

(Disclosure: I'm a developer too - I'm just guessing!)