popular mass storage options

There are three types of enterprise storage:

  • DAS - Directly Attached Storage

This is storage that is either internal to the machine or is an external array of disks connected to the bus of the computer. Examples would include anything from an internal hard drive to an external USB drive to a 12 bay SCSI RAID array. The defining factor is that the storage is on the local bus.

  • NAS - Network Addressable Storage

This is storage that is available over the (usually) TCP/IP network, using higher level protocols. If your desktop has an NFS mount, you can consider that data to be on a NAS server. NAS is a relative idea. The actual storage could be internal hard drives on another server or be a network appliance that does nothing but provide storage. Either way, from the perspective of the client, it's network addressable. The protocols that are typically included as NAS are NFS, CIFS/Samba, and FTP, though there are probably others

  • SAN - Storage Area Network

Network addressable storage used a TCP/IP network to transport data using higher level protocols. Storage Area Networks use lower level protocols to present block devices, and typically they their own network fabric to do it, though not necessarily. There are two very common SAN fabrics: iSCSI and Fibre channel. The traffic is sent to the fabric via a Host Bus Adapter (HBA), which you can think of as a SAN network card.

iSCSI utilizes its own IP network, and is capable of using standard network adapters and switches, which makes it relatively cheap. iSCSI HBAs are a lot like normal network cards, although many of them perform TCP offloading to conserve processor use on the server. Access control is provided through permitting or denying IP addresses from accessing disk resources. As with any other IP based network, ACLs, firewalls, and routing can be used as traffic control devices, although this is frowned upon by many due to the latency that it causes. Speeds range from 1Gb/s to 10Gb/s for brand new, highly expensive network gear.

Fibre Channel utilizes a completely separate mesh from existing IP based networks. It requires FC HBAs which are either native fiber transceivers or copper SFP connections. FC switches are available to provide additional network segments. Machines are addressed using World Wide IDs or Names (WWIDs), and access control is provided using these addresses by switches and the SAN storage itself. I'm not sure of the current maximum FC speed, but I know that 4Gb/s is available, and I suspect 8Gb/s is too, for enough money. If not, it will be soon.

The typical SAN storage is a large array of disks in an expensive, highly redundant enclosure. Using embedded software on the array, the SAN administrator uses the available disks to create slices, identified as Logical Unit Numbers (LUNs), which are presented to the specified servers as raw SCSI devices.

On the server itself, these devices are treated identically to any local storage. Partitions are created, filesystems are put in place, and data is written normally.

iSCSI and FC aren't the only SAN fabrics, but they are the most common. Also available is the ultra fast (40-50Gb/s) ultra expensive (> $500 for an HBA) infiniband fabric, as well as the ultra cheap (off the shelf NICs) relatively slow ATA-over-Ethernet, which is sort of a layer-2 equivalent of iSCSI.

So now that the preliminary information is out of the way, lets answer your question...

You want tens of terabytes of data, and 5 minutes ago, you were unfamiliar with SANs. This is an interesting predicament. Assuming that you really DO need tens of terabytes of data, you're going to be shelling out significant amounts of money. Way more than you probably think.

You can walk into any Best Buy and even with their obscenely overpriced crap, walk out with 18 TB for something like $1500. Storage is cheap, right?

The thing is, storage might be cheap, but enterprise storage is damned expensive. You're not paying for the storage itself, you're paying for the reliability of the enclosure, for the software that runs on it, and you are paying for the enterprise support that comes with the device. I have a baby setsup: EMC AX4/5 SAN storage with 12 1TB disks, and enterprise support from EMC. This gives me somewhere around 8TB of usable data (because you lose a certain amount from RAID, depending on what RAID level you do) for something like $9k, I think? Because hey, if you're spending that much, you've got to get the dual controllers, and it's FC, so you've got to buy the switch(es) so you can have multipath.

If you really do need to have storage like this, and you really are this new to enterprise storage, I HIGHLY recommend that you spend a fraction of the money you'll end up shelling out and go get training. Otherwise, you're bound to not get the right thing, and it'll take you twice as long to do it right, because the first time, you'll do it wrong. It's not you, is just that there is a large amount of tacit knowledge involved with this, and it takes time to get it.

Good luck.

EDIT Ah, sorry for the dire warnings, homestead. Just wanted to make sure that you knew the significance of the task at hand :-)

Also, I did end up using this as a blog entry:
http://www.standalone-sysadmin.com/blog/2009/12/introduction-to-enterprise-storage/