What qualities to look for in a NAS for short term storage?

Background

I work at a research department working with biomedical data and we are currently considering to the revise our IT-structure. We have several instruments that generate GBs of data on a daily basis, connected to network-isolated computers. The data is carried around in the network, processed at intermediate steps before it's transferred to the national data storage service for universities.

What we need to improve is the intermediate step where the data is stored for short term (~3 months) during which researchers can access the data without having to query the data from a remote data center. As it is, the intermediate server is used for a number of different purposes, and usually runs out of space. We intend on buying a NAS which will be dedicated for short term storage of instrument data. I was given the responsibility to come up with alternatives.

I started off by charting out what we need, which led to the following list of our requirements:

  • at least 8TB space: this should not really be an issue with modern setups
  • Gb bandwidth: same as above
  • rack-mount: so that the NAS will physically be close to the other servers we have
  • expandable: in case our data volume increases in the near future (I assume it will)
  • minimal maintenance: we don't have the liberty (economically and bureaucratically) to have full-time system admins, as it is the most tech-savvy scientists help out with server maintenance. None of us are IT-professionals...

Question(s)

I started reading on storage systems, the list of most common questions on meta was a great resource. Likewise I found two similar question asking about storage in a research environment:

  • Scalable (> 24 TB) NAS for research department
  • networked storage for a research group, 10-100 TB

However both questions seem to focus on long-term storage, and also focus on individual appliances, whereas I am mostly interested in figuring out what features/specs/qualities are valuable in this context.

Based on prior knowledge and recent reading, I figure there are a couple of aspects which could be of importance when choosing a NAS in our case:

  • support for SAS drives - is it really crucial? I understand that SAS drives are of higher quality generally, but assuming that there is redundancy in the array, what's the big deal if a SATA disk dies?

  • Link aggregation - I have to say I am not well-read about the network layers and devices that go along with it, but my limited understanding of link aggregation is that with multiple network cards, a NAS can theoretically double/triple the bandwidth, likewise the multiple links be used for error correction (at least according to Synology). I would appreciate any additional information that might help me make sense of this and distinguish the reality from marketing talk.

  • Multiple networks - it would make sense for us to be able have the NAS available in two different VLANs that do not see one another due to the isolation criteria we have on some computers. If the NAS has two ethernet ports, is it as simple as connecting it to two different networks and be done with it??

  • Hot-swap etc - there seem to be a number of different versions of this aspect. My understanding is that hot-swap refers to an extra disk connected to the NAS which is written to first when one disk fails. Is this correct? If so is hot-swap a cool feature to have, or a must even though the array is running single/double redundancy?

  • Another version of "hot-swap" (which I am not sure how it's called) allows for replacement of disks while the server is on-line, so it's sort of a hot-replacement (Drobo offers something like this). Is it a common feature, or something specific to Drobo? Are there similar technologies available? Is there a "catch" that I might not be aware of? Otherwise I think it's pretty interesting since it allows for online expansion of the storage space.

The above list of features were some that I have been pondering about, I would really appreciate some insight into these and possibly others I might have missed.


Purchase a ZFS-based appliance. Anything using NexentaStor would be a good start, but you sound like you'd also want/need vendor support.

Something like a PogoStorage StorageDirector would work.
Another nice canned-option are the DataON NexentaStor offerings.

Both of these vendors can profile and tailor a solution specific to your storage and performance needs. This is not an uncommon request, so speaking with a vendor with knowledge about your field would be helpful.

Why ZFS?

  • Excellent scalability in capacity and performance.
  • Intelligent caching. This comes into play with your specific application. Working-sets of data can will rise on a faster tier of storage (SSD). This is the ZFS L2ARC cache.
  • If you're mounting via NFS or CIFS, write-acceleration is possible. This is made possible by ZFS ZIL devices.
  • It's incredibly-resilient.

All of the checkmarks are hit:

  • SAS drives. Yes. They're important and more stable than SATA-based solutions.
  • Link aggregation. Sure. I prefer 10GbE from the storage unit to the switch if there will be lots of consumers. 10GbE end-to-end is even better, depending on your anticipated workload.
  • Multiple networks are certainly possible. Consider trunking from the storage array to the switch.
  • Hot-swap drives. This is a given. It allows you to swap disks while the system is running. However, at that capacity level, you may also want a hot-spare drive, which will be called into action immediately if a drive fails.

Nearly any NAS (other than the smaller consumer models) or SAN supports online expansion, which means if you need more space you can just insert more drives and your existing volume can grow to include the new drives.

SAN and NAS devices with multiple ports can be used for link aggregation or for access from multiple networks. However, both of these features will vary from product to product. Many will specifically list link aggregation as supported but I doubt you will find a product that lists access from multiple networks. That's not a commonly requested feature.

You seem to have confused hot swap and hot spare.

  • Hot swap drives allow you to replace a drive while powered on.
  • A hot spare (also known as an online spare, or simply a spare) is a drive that is physically inserted but does not have and data on it. If another drive in the array fails, the array will rebuild the data from the failed drive on to the hot space.

SATA drives can be OK, provided you use ones that are recommended by the SAN/NAS manufacturer. At a minimum, the SATA drive should be enterprise grade and rated for use in RAID arrays. These types of drives have special firmware that make them more suitable for use in arrays. Using standard consumer drives (especially those not certified by the SAN/NAS manufacturer) frequently results in strange failure scenarios. In terms of overall reliability and stability, this is definitely a case of you get what you pay for.


Shopping questions are off-topic

However, shopping questions are off-topic, so specific recommendations for products will not be forthcoming.

I suggest that you contact a few SAN/NAS manufacturers (Synology, Drobo, HP, Dell, EMC) and describe your needs. They will suggest products and you can choose between them.

Some things you will need to find out or decide on before you do this:

  • What kind of throughput do you need? Just because you have a Gigabit Ethernet port on your server doesn't mean a) that your server can actually move data at Gigabit speeds on and off the disk, and b) that you are using 1 Gigabit of bandwidth.
    • The answer to this question will determine the performance needed from the SAN/NAS controller, the number of network interfaces, and the protection strategy of the disks (i.e., RAID level).
  • How much space are you using?
  • How much space do you WANT to have?
  • How much do you want to be able to grow easily?
  • How do you intend to back up this data?

  1. Support for SAS drives is good if you need the speed associated with them. They cost more than SATA, but not as much as SSD, and they sit in the middle of the two in terms of speed. If speed isn't an issue, support is nice, but it isn't worth paying significantly more for.

  2. Link Aggregation is exactly what you mentioned, the ability to combine two real NICs into a faster single one, I've never heard of error correction, but maybe I just haven't been looking in the right places.

  3. Multiple networks usually just involves connecting and configuring the network cards.

  4. The first definition of hot-swap you offer seems more like hot standby/spare, where you have a disk installed that can be used in case of a failure of a another disk in use. The second is what I've always understood hotswap to mean, the ability to change disks without having to shut down the machine whilst replacing them.

As for further considerations, ewwhite's answer is probably going to be more useful in the long run ;)