When trying to choose a SATA magnetic hard drive (not SSD) for high performance for both random and linear access, which should be the primary factor?

For example: would a 10k RPM drive with 16MB of cache perform better than a 7200RPM drive with 32MB of cache


Solution 1:

The short answer is yes. Your total hard drive latency is the [seek latency] + [rotation latency].

The 10K RPM drive will have a smaller rotational latency due to its faster spinning and will also be able to read data off of the drive faster. What the higher cache will do is for writes. A cache is similar to a buffer. When it reads data from the disk it will store recently accessed data and data within the near vicinity for quicker access. This is called temporal and spatial locality. The higher cache will be useful if your access pattern is such that you read the same file a lot or the data is stored near each other.

Wikipedia has a decent page on disk caches.

Solution 2:

That's a very difficult question to answer, and will be affected by other factors such a NCQ, command queue support.

I think a rule of thumb is that for lots of small accesses, random I/O, go for rpm. For linear access go for cache.

Solution 3:

Depends on the likelihood of cache hits. If you have a small amount (8/16/32MB) of data across your disk that you're always reading from and writing to then you'll get a very high cache hit % and so the bigger the cache the better. Of course your OS may be able to cache much more than that, and using faster memory too. If the likelihood of high cache hits is low, i.e. your data set is larger many times larger than your disk cache, then I'd go for as low a random seek time as possible given the data set size.

Either way just get a mirrored pair of Velociraptors if you need 270GB or less or a pair of Seagate Barracuda 7200.11's if you need more. We could dance around it all day but these will sort you out :)

Solution 4:

Ultimately it depends on the data you're using. Cache improves performance of things that get accessed again and again and again, like files on a webserver, for example. It also improves write performance, since data only needs to be written to the cache, which then gets spooled to disk when the disk is available (or the cache is full). Higher RPMs improve "random" access (seek time, for instance), which is what a database needs.

I would go with a higher RPM, all things being equal.

Solution 5:

Don't forget that areal density is also a factor in performance. All else being equal, you will realize faster data transfer rates as the areal density of a drive platter increases. Rotational speeds are primarily related to access times, but once the data is located, the areal density becomes a factor in the throughput of reading the data.

So, also go big on capacity. Big capacity drives tend to be the ones with higher areal density per platter.