Benchmarking Azure's Premium Storage P30 Disks

We're running performance tests on two new Standard DS13 (8 Core, 56 GB) VMs (both using the latest/default Windows 2012 R2 image) backed by Premium Storage and have hit a wall with step 1 in testing the local SSD performance.

We understand 25% of the 400GB local SSD for these VMs is made available as temporary storage and the other 75% is used for Premium Storage caching: http://azure.microsoft.com/blog/2014/12/11/new-premium-storage-backed-virtual-machines/

On the remaining 25%, we expect to see performance along these lines: http://www.brentozar.com/archive/2014/09/azure-really-60-faster/ http://azure.microsoft.com/blog/2014/10/06/d-series-performance-expectations/

... but Crystal Disk Mark shows it crawling along:

               Sequential Read :     4.097 MB/s
              Sequential Write :     4.096 MB/s
             Random Read 512KB :     4.112 MB/s
            Random Write 512KB :     4.112 MB/s
        Random Read 4KB (QD=1) :     2.057 MB/s [   502.3 IOPS]
       Random Write 4KB (QD=1) :     2.057 MB/s [   502.2 IOPS]
       Random Read 4KB (QD=32) :     2.048 MB/s [   500.0 IOPS]
      Random Write 4KB (QD=32) :     2.047 MB/s [   499.7 IOPS]

  Test : 50 MB [D: 7.2% (8.1/112.0 GB)] (x5)
  Date : 2015/02/14 15:35:41
    OS : Windows Server 2012 R2 Datacenter (Full installation) [6.3 Build 9600] (x64)

The performance of the OS disk is better but nowhere close to the 150 MB/s you'd expect for a P20 disk (assuming that's what's allocated for the default 127GB OS disk).

Expecting:

http://azure.microsoft.com/en-us/documentation/articles/storage-premium-storage-preview-portal/

Seeing:

           Sequential Read :    66.031 MB/s
          Sequential Write :    63.034 MB/s
         Random Read 512KB :    65.861 MB/s
        Random Write 512KB :    63.580 MB/s
    Random Read 4KB (QD=1) :     2.097 MB/s [   511.9 IOPS]
   Random Write 4KB (QD=1) :     2.047 MB/s [   499.7 IOPS]
   Random Read 4KB (QD=32) :     2.086 MB/s [   509.3 IOPS]
  Random Write 4KB (QD=32) :     2.078 MB/s [   507.4 IOPS]

  Test : 50 MB [C: 12.9% (16.4/127.0 GB)] (x5)
  Date : 2015/02/14 15:46:35
    OS : Windows Server 2012 R2 Datacenter (Full installation) [6.3 Build 9600] (x64)

And the performance of the P30 disk (with ReadOnly cache) isn't much better:

           Sequential Read :   204.567 MB/s
          Sequential Write :    39.677 MB/s
         Random Read 512KB :   204.549 MB/s
        Random Write 512KB :    34.865 MB/s
    Random Read 4KB (QD=1) :    20.951 MB/s [  5114.9 IOPS]
   Random Write 4KB (QD=1) :     1.666 MB/s [   406.7 IOPS]
   Random Read 4KB (QD=32) :    20.893 MB/s [  5100.9 IOPS]
  Random Write 4KB (QD=32) :    20.944 MB/s [  5113.4 IOPS]

  Test : 50 MB [E: 0.0% (0.2/1023.0 GB)] (x5)
  Date : 2015/02/14 15:22:59
    OS : Windows Server 2012 R2 Datacenter (Full installation) [6.3 Build 9600] (x64)

When compared to our current CloudDrive with host caching deployed on D13s (note the performance of 4KB random reads):

           Sequential Read :   136.711 MB/s
          Sequential Write :    10.210 MB/s
         Random Read 512KB :   190.744 MB/s
        Random Write 512KB :     9.063 MB/s
    Random Read 4KB (QD=1) :    10.813 MB/s [  2639.8 IOPS]
   Random Write 4KB (QD=1) :     0.508 MB/s [   107.5 IOPS]
   Random Read 4KB (QD=32) :   106.533 MB/s [ 26009.1 IOPS]
  Random Write 4KB (QD=32) :     9.363 MB/s [  2286.0 IOPS]

  Test : 50 MB [F: 4.1% (24.9/600.0 GB)] (x5)
  Date : 2015/02/14 20:25:01
  OS : Windows Server 2012 Datacenter (Full installation) [6.2 Build 9200] (x64)

And this is what SQLIO reports for the local SSD:

C:\Program Files (x86)\SQLIO>sqlio -dD
sqlio v1.5.SG
1 thread reading for 30 secs from file D:testfile.dat
        using 2KB IOs over 128KB stripes with 64 IOs per run
size of file D:testfile.dat needs to be: 8388608 bytes
current file size:      0 bytes
need to expand by:      8388608 bytes
expanding D:testfile.dat ... done.
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec:   499.38
MBs/sec:     0.97

And for the P30:

C:\Program Files (x86)\SQLIO>sqlio -dE
sqlio v1.5.SG
1 thread reading for 30 secs from file E:testfile.dat
        using 2KB IOs over 128KB stripes with 64 IOs per run
size of file E:testfile.dat needs to be: 8388608 bytes
current file size:      0 bytes
need to expand by:      8388608 bytes
expanding E:testfile.dat ... done.
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec:  5103.03
MBs/sec:     9.96

The 5000 IOPS advertised for the P30 is holding up but what about the 200 MB/s throughput per disk?

NOTE: Attempts to create the P30 data disk with ReadWrite cache policy result in:

Update-AzureVm : BadRequest: The disk cache setting ReadWrite is not supported for DataVirtualHardDisk.

Any guidance would be appreciated:

  • Why is the local SSD storage throttled at 500 IOPS and 1-4 MB/s throughput?
  • How do we achieve 200MB/s on writes as we see with reads on P30s, what's the test to run?
  • MS: can you publish I/O benchmarks that we can run to validate max limits?

Solution 1:

To answer your questions:

  1. Local storage is throttled to 500 IOPS @8KB. Those limits were a mistake and will be raised substantially soon.
  2. To hit 200 MB/sec on writes you need to (a) use a block size of at least 40KB (otherwise you run into the 5,000 IOPS limit first), and (b) use a queue depth of at least 25 (for a 40KB block, as the block size goes up, you can use a smaller queue depth).
  3. We agree, it would be nice if we published benchmarks that you can use to validate the limits. If we do, it probably won't be until we move out of preview.

David Berg - Microsoft Azure Performance Team