Why would Linux VM in vSphere ESXi 5.5 show dramatically increased disk i/o latency?
I'm stumped and I hope someone else will recognize the symptoms of this problem.
Hardware: new Dell T110 II, dual-core Pentium G850 2.9 GHz, onboard SATA controller, one new 500 GB 7200 RPM cabled hard drive inside the box, other drives inside but not mounted yet. No RAID. Software: fresh CentOS 6.5 virtual machine under VMware ESXi 5.5.0 (build 1746018) + vSphere Client. 2.5 GB RAM allocated. The disk is how CentOS offered to set it up, namely as a volume inside an LVM Volume Group, except that I skipped having a separate /home and simply have / and /boot. CentOS is patched up, ESXi patched up, latest VMware tools installed in the VM. No users on the system, no services running, no files on the disk but the OS installation. I'm interacting with the VM via the VM virtual console in vSphere Client.
Before going further, I wanted to check that I configured things more or less reasonably. I ran the following command as root in a shell on the VM:
for i in 1 2 3 4 5 6 7 8 9 10; do
dd if=/dev/zero of=/test.img bs=8k count=256k conv=fdatasync
done
I.e., just repeat the dd command 10 times, which results in printing the transfer rate each time. The results are disturbing. It starts off well:
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB) copied, 20.451 s, 105 MB/s
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB) copied, 20.4202 s, 105 MB/s
...
but after 7-8 of these, it then prints
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GG) copied, 82.9779 s, 25.9 MB/s
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB) copied, 84.0396 s, 25.6 MB/s
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB) copied, 103.42 s, 20.8 MB/s
If I wait a significant amount of time, say 30-45 minutes, and run it again, it again goes back to 105 MB/s, and after several rounds (sometimes a few, sometimes 10+), it drops to ~20-25 MB/s again.
Based on preliminary search for possible causes, in particular VMware KB 2011861, I changed the Linux i/o scheduler to be "noop
" instead of the default. cat /sys/block/sda/queue/scheduler
shows that it is in effect. However, I cannot see that it has made any difference in this behavior.
Plotting the disk latency in vSphere's interface, it shows periods of high disk latency hitting 1.2-1.5 seconds during the times that dd
reports the low throughput. (And yes, things get pretty unresponsive while that's happening.)
What could be causing this?
I'm comfortable that it is not due to the disk failing, because I also had configured two other disks as an additional volume in the same system. At first I thought I did something wrong with that volume, but after commenting the volume out from /etc/fstab and rebooting, and trying the tests on / as shown above, it became clear that the problem is elsewhere. It is probably an ESXi configuration problem, but I'm not very experienced with ESXi. It's probably something stupid, but after trying to figure this out for many hours over multiple days, I can't find the problem, so I hope someone can point me in the right direction.
(P.S.: yes, I know this hardware combo won't win any speed awards as a server, and I have reasons for using this low-end hardware and running a single VM, but I think that's besides the point for this question [unless it's actually a hardware problem].)
ADDENDUM #1: Reading other answers such as this one made me try adding oflag=direct
to dd
. However, it makes no difference in the pattern of results: initially the numbers are higher for many rounds, then they drop to 20-25 MB/s. (The initial absolute numbers are in the 50 MB/s range.)
ADDENDUM #2: Adding sync ; echo 3 > /proc/sys/vm/drop_caches
into the loop does not make a difference at all.
ADDENDUM #3: To take out further variables, I now run dd
such that the file it creates is larger than the amount of RAM on the system. The new command is dd if=/dev/zero of=/test.img bs=16k count=256k conv=fdatasync oflag=direct
. Initial throughput numbers with this version of the command are ~50 MB/s. They drop to 20-25 MB/s when things go south.
ADDENDUM #4: Here is the output of iostat -d -m -x 1
running in another terminal window while performance is "good" and then again when it's "bad". (While this is going on, I'm running dd if=/dev/zero of=/test.img bs=16k count=256k conv=fdatasync oflag=direct
.) First, when things are "good", it shows this:
When things go "bad", iostat -d -m -x 1
shows this:
ADDENDUM #5: At the suggestion of @ewwhite, I tried using tuned
with different profiles and also tried iozone
. In this addendum, I report the results of experimenting with whether different tuned
profiles had any effect on the dd
behavior described above. I tried changing the profile to virtual-guest
, latency-performance
and throughput-performance
, keeping everything else the same, rebooting after each change, and then each time running dd if=/dev/zero of=/test.img bs=16k count=256k conv=fdatasync oflag=direct
. It did not affect the behavior: just as before, things start off fine and many repeated runs of dd
show the same performance, but then at some point after 10-40 runs, performance drops by half. Next, I used iozone
. Those results are more extensive, so I’m putting them in as addendum #6 below.
ADDENDUM #6: At the suggestion of @ewwhite, I installed and used iozone
to test performance. I ran it under different tuned
profiles, and used a very large maximum file size (4G) parameter to iozone
. (The VM has 2.5 GB of RAM allocated, and the host has 4 GB total.) These test runs took quite some time. FWIW, the raw data files are available at the links below. In all cases, the command used to produce the files was iozone -g 4G -Rab filename
.
- Profile
latency-performance
:- raw results: http://cl.ly/0o043W442W2r
- Excel (OSX version) spreadsheet with plots: http://cl.ly/2M3r0U2z3b22
- Profile
enterprise-storage
:- raw results: http://cl.ly/333U002p2R1n
- Excel (OSX version) spreadsheet with plots: http://cl.ly/3j0T2B1l0P46
The following is my summary.
In some cases I rebooted after a previous run, in other cases I didn’t, and simply ran iozone
again after changing the profile with tuned
. This did not seem to make an obvious difference to the overall results.
Different tuned
profiles did not seem (to my admittedly inexpert eyes) to affect the broad behavior reported by iozone
, though the profiles did affect certain details. First, unsurprisingly, some profiles changed the threshold at which performance dropped off for writing very large files: plotting the iozone
results, you can see a sheer cliff at 0.5 GB for profile latency-performance
but this drop manifests itself at 1 GB under profile enterprise-storage
. Second, although all profiles exhibit weird variability for combinations of small file sizes and small record sizes, the precise pattern of variability differed between profiles. In other words, in the plots shown below, the craggy pattern in the left side exists for all profiles but the locations of the pits and their depths are different in the different profiles. (However, I did not repeat runs of the same profiles to see if the pattern of variability changes noticeably between runs of iozone
under the same profile, so it is possible that what looks like differences between profiles is really just random variability.)
The following are surface plots of the different iozone
tests for the tuned
profile of latency-performance
. The descriptions of the tests are copied from the documentation for iozone
.
Read test: This test measures the performance of reading an existing file.
Write test: This test measures the performance of writing a new file.
Random read: This test measures the performance of reading a file with accesses being made to random locations within the file.
Random write: This test measures the performance of writing a file with accesses being made to random locations within the file.
Fread: This test measures the performance of reading a file using the library function fread(). This is a library routine that performs buffered & blocked read operations. The buffer is within the user’s address space. If an application were to read in very small size transfers then the buffered & blocked I/O functionality of fread() can enhance the performance of the application by reducing the number of actual operating system calls and increasing the size of the transfers when operating system calls are made.
Fwrite: This test measures the performance of writing a file using the library function fwrite(). This is a library routine that performs buffered write operations. The buffer is within the user’s address space. If an application were to write in very small size transfers then the buffered & blocked I/O functionality of fwrite() can enhance the performance of the application by reducing the number of actual operating system calls and increasing the size of the transfers when operating system calls are made. This test is writing a new file so again the overhead of the metadata is included in the measurement.
Finally, during the time that iozone
was doing its thing, I also examined the performance graphs for the VM in vSphere 5’s client interface. I switched back and forth between the real-time plots of the virtual disk and the datastore. The available plotting parameters for the datastore were greater than for the virtual disk, and the datastore performance plots seemed to mirror what the disk and virtual disk plots were doing, so here I enclose only a snapshot of the datastore graph taken after iozone
finished (under tuned
profile latency-performance
). The colors are a little bit hard to read, but what is perhaps most notable are the sharp vertical spikes in read latency (e.g., at 4:25, then again slightly after 4:30, and again between 4:50-4:55). Note: the plot is unreadable when embedded here, so I've also uploaded it to http://cl.ly/image/0w2m1z2T1z2b
I must admit, I don’t know what to make of all this. I especially don’t understand the weird pothole profiles in the small record/small file size regions of the iozone
plots.
Solution 1:
Can you give the exact ESXi build number? Please try tests again with a purpose-built disk performance analysis tool like fio or iozone to get a real baseline. Using dd
is not really productive for this.
In general, the default I/O scheduler in EL6 isn't that great. You should be looking at shifting to the deadline or noop I/O elevators, or better yet, installing the tuned framework.
Try: yum install tuned tuned-utils
and tuned-adm profile virtual-guest
, then test again.
Solution 2:
I ran into the same problem and noticed a very slow drive performance within virtual machines. I'm using ESXi 5.5 on a Seagate ST33000650NS.
By following this kb article I changed Disk.DiskMaxIOSize
to my disks block size. In my case 4096
.
VMware note on this is very nice, since you can just test it.
Note: You can make this change without rebooting the ESX/ESXi host or without putting the ESX/ESXi host in maintenance mode.
I know this question is very old, but mhucka put so much energy and information in his post, that I had to answer.
Edit#1: After using 4096 for a day, I switched back to the old value 32767
. Testing the IO and everything still seems to be stable. My guess is that running a ESXi on a normal HDD with Disk.DiskMaxIOSize
set to 32767
will work fine for some hours or maybe days. Maybe it takes some load from the VMs to gradually reduce the performance.
I try to investigate and come back later...