With 200TB of needed storage, is a SAN a good idea?
My employer has, what I consider to be, a lot of data. We currently have about 10TB of data on our NAS, which is about its capacity, and we would have more data on it if we didn't constantly move data onto external hard drives and DVDs in order to free up space. Because we need to use all of that data, and we don't have the capacity to store any more, our backups are in shambles.
By my calculations in order to store all of the data we have, plus our archives, and backups, we'll need about 200TB of storage, this accounts for a bit of expected growth over the next few years as well.
My question is, with 200TB of storage space needed, should we be looking to set up a SAN, or should I instead be looking to get a large NAS like a Netapp appliance, or something else entirely? At what point do data storage requirements become big enough to warrant the effort involved in setting up and administering a SAN?
If you can afford it, a centralized storage system is ideal for safely and reliably serving your company's data.
That said, check your math- while it's fine to back servers and computers up to a NAS, it's almost never the best choice. Archives are also not optimal for a NAS. Archives and backups are best put on something else like a tape drive or deduplicated disk pool. Either way, you need something like this to back up the primary data stored on the NAS.
For 200TB, yes you want an array. However you're thinking in the wrong direction if you're trying to square off SAN vs. NAS. They're different tools for different jobs.
NAS is network accessible - typically delivering NFS or CIFS storage, sometimes iSCSI. (And sometimes FCOE).
SAN typically means a fiber channel network providing block storage over SCSI to a host.
Fundamentally, as SAN is a high speed, low latency network dedicated for storage.
In either case though, the array you're talking to is still pretty similar - it's a set of disks configured in a RAID set. (Which configuration you use depends on budget and performance requirement).
So the real question is - how much performance or concurrent access do you need? NFS and CIFS over Ethernet are fine for moderate performance loads (especially over a 10G network) but there's always going to be more protocol overhead than Fiber Channel. However, they're also quite good for sharing storage between multiple hosts/users for concurrent access.
SAN has more cost overhead - a pair (at least) of HBAs per server, and ports on a fabric switch. But it tends to be faster.
Seperate from this is the difference between backup/archive disk profiles and front end storage - front end storage needs good peak performance. Backup doesn't really.
If you have a tight budget, a bit of free time and some geek skills you can also consider using some of the distributed file systems like MooseFS or GlusterFS (more options here: http://en.wikipedia.org/wiki/Comparison_of_distributed_file_systems). I played a bit with MooseFS in my previous work and it served well as a file storage. It won't be as fast as e.g. NetApp but much, much cheaper. And you won't be limited to specific hardware vendor because you can connect anything to DFS to expand the storage (servers, desktops, laptos with any kind of the disks). Of course network bandwidth will be your bottleneck in this case.