To improve SQL performance, why not just put lots of RAM rather than having faster hard disks?

People keep telling me that in order to improve an SQL server's performance, buy the fastest hard disks possible with RAID 5, etc.

So I was thinking, instead of spending all the money for RAID 5 and super-duper fast hard disks (which isn't cheap by the way), why not just get tonnes of RAM? We know that an SQL server loads the database into memory. Memory is wayyyy faster than any hard disks.

Why not stuff in like 100 GB of RAM on a server? Then just use a regular SCSI hard disk with RAID 1. Wouldn't that be a lot cheaper and faster?


Solution 1:

Your analysis is fine -- to a point -- in that it absolutely will make things faster. You still have to account for a couple of other issues though:

  1. Not everyone can afford enough memory; when you have multiple terabytes of data, you have to put it on disk some time. If you don't have much data, anything is fast enough.

  2. Write performance for your database is still going to be constrained by the disks, so that you can keep the promise that the data was actually stored.

If you have a small data set, or don't need to persist it on disk, there is nothing wrong with your idea. Tools like VoltDB are working to reduce the overheads that older assumptions in RDBMS implementations made which constrain pure in-memory performance.

(As an aside, people telling you to use RAID-5 for database performance are probably not great folks to listen to on the subject, since it is almost never the best choice - it has good read performance, but bad write performance, and writes are almost always the production constraint - because you can throw RAM into caching to solve most read-side performance issues.)

Solution 2:

Short version: consider the working set size. Long version: How big is your data? If it can fit in memory of a modern server, yes, you're absolutely right. Unfortunately, the biggest Xeon can address 2TB of RAM right now, and that's not that big of a dataset any more. If you can't buy machine big enough to house your entire working set in RAM, you're forced to solve problems with your brain, not your wallet.

Solution 3:

If you want speed:

  • Increase RAM so at least frequently used indexes can entirely fit into RAM (for example, on a system I work on, 32GB RAM is plenty for a 350GB database, because indexes are what you need in RAM, not raw data)
  • Use RAID10 with any disks (faster disks are better)
  • Avoid RAID5
  • Split mdf, ldf and temp DB onto discrete spindle sets (example: tempdb on its own RAID1 set, ldf on its own RAID1 or RAID10 spindle set, mdf on a RAID 10 set with at least 4 total disks)

Follow those steps, and SQL Server will fly.

Then if you want, add more RAM... but do the above first, and you may well find you are done.

Solution 4:

RAM is the new disk, disk is the new tape.

In http://www.tbray.org/ongoing/When/200x/2006/05/24/On-Grids . Note that was six years ago. Yes, we have database systems that try (and try hard) to keep the entire dataset in RAM and rather shard to multiple machines than to use the disk because disk is magnitudes slower anyways. You need to write out the dataset to disk but as in the motto above, that's more akin to a background backup task than an online operation. Durability is achieved through append only logs with these databases (I am thinking MongoDB and Redis but there are tons more).