How many IOPS do I need? My workload bottleneck is storage

Solution 1:

If you know you're storage bound, then benchmarks on your server won't definitively tell you how much you need. They can only tell how fast you can go while subject to the limited storage. In order to properly get the answer you're looking for, you need to, if possible, isolate the different ways you can be storage throttled and test them independently.

IOPS is of course the easy limit that everyone talks about, because disks are bad at seeking and databases like to seek. These days with cache and SSD, small block IO random seek reads are a lot easier than they used to be. A small tier of SSD and a large cache will probably ensure that if it really is IOPS (for small block "seek" type IO) that's your bottleneck, you won't be subject to it any more. Be careful about these benchmarks, though- you'll read all kinds of unrealistic numbers as people measure the number of IOs they can do straight to unmirrored cache. That's not going to help your linux server.

Another type of storage limit is bandwidth, or throughput. This one is hard to isolate, but if you know how much data you're trying to read or write and you know how long it takes you now, pick a new time target, and that'll be your new number. For example: if you observe your application spending 4 hours to do a large backup or something, and at the end of it, it's moved 9 TB, that tells you your current throughput limit: about 650 MB/s. If you want to move 18 TB in that time, you need 1300 MB/s. For the most part, ethernet, fibre, and SAS can all be configured to go faster than storage hardware. The storage's ability to keep that transfer layer full is usually the real bottleneck. You want to look at the number of front end ports, and the benchmark numbers with cache mirroring turned on (to ensure there's no bottleneck between controllers mirroring cached writes).

Lastly, you can be limited by bad storage configuration in terms of SCSI queues. This is not ridiculously common, but is defined by being unable to push your storage hardware as fast as it should go. If you are seeing 500ms latency on writes from the host, but your storage reports 3ms 100% cache hits, that can be an issue with insufficient SCSI queues on the target. Basically the SCSI initiator is waiting up to 500ms to free up a slot in its queue it can use to take requests. You want to ask your storage vendor for the best practices on host queue depth settings and fan-out ratio for this.

I hope this helps, I know it's not as simple an answer as you were hoping for.

Solution 2:

iostat command will show you information you want. Just run:

iostat 1

The output will be something like this:

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              42.00       128.00        84.00        128         84

The tps is transactions per second which is same as ops.

This will make it update every second.

You usually need to have systat package installed on your Linux Distribution to have iostat available.

Solution 3:

If you can vary the load on the application from 1 TPS to well past the point of bottlenecking, you can build a model of the relationship of TPS and I/O operation rate and bandwidth.

Lets say:

  1 TPS causes   6 IOs and   2 KB of transfer, per second
 10 TPS causes  16 IOs and  11 KB
100 TPS causes 106 IOs and 101 KB
  but
200 TPS causes 107 IOs and 102 KB
300 TPS causes 107 IOs and 102 KB

1) Then you have a bottleneck at 100 TPS offered, plus

2) there is an overhead of 5 IOs and 1 KB, after which each transaction uses 1 IO and 1 KB of transfer

Now:

  1. is the limit of your existing device,
  2. is your budget, which you use for figuring how much to provision for each TPS you want to handle

If it says it's good for

10,000 IOPs and 100 KB/S, only the latter is meaningful to you. If it says it's good for 100 IOPS and 10,000 KB/S, only the the former is meaningful. Sometimes it will bottleneck on IPS initially, bandwidth in large configurations

To measure this, do lots of individual tests, with repetitions, and plot the results on a graph: your eyes are better at pictures than at tables of numbers.

The throughput graph should start out as a slope, something like /, then abruptly level off and go horizontal or sometimes back down again. If you plot response time, it will look like _/ The bends will line up, at around the bottleneck load.

And yes, it will be a scatterplot of dots approximating those curves, not nice straight lines (;-))

--dave