Esxi Host: acceptable memory overcommitment

This basically will depend on the virtual machines and their memory usage. ESXi employs a number of techniques allowing to overcommit memory for guests:

1. Memory compression cache

Memory pages which have been inactive for a while are being compressed and get uncompressed and served upon request instead of being swapped to disk or ballooned. Page compression has a configurable upper limit which is set to 10% of the guest's assigned memory by default and you can roughly estimate a 6% performance decrease when using the compression cache in real-world scenarios according to this VMWare white paper.

2. Page sharing

Virtual memory pages of different guests which have been found to carry identical information are referenced to the same physical memory page. This is an asynchronous operation freeing duplicate memory pages regularly.

3. Memory ballooning

A kernel-level driver in the guest supplied with the VMWare tools will claim memory in the nonpaged memory pool of the guest and mark it as "Free" for the hypervisor. This way, the memory is effectively temporarily "stolen" from the guest, inducing guest-level swapping should the memory really be needed by the guest.

4. Swapping

If everything else fails and more memory is needed, ESXi swaps guest memory pages to disk. The location of the swap file is configurable and is placed in the same directory with the guest configuration files by default.


I have found that for my typical loads page compression and page sharing yield around 10% in memory savings over the memory overhead incurred by ESXi without notable performance degradations. Ballooning will always work, as long as it is configured to (you can effectively turn it off by reserving the entire memory amount to the guest), but basically it is only marginally better than swapping (it is where your guests would have otherwise dynamically claimed large amounts of memory for caching, but if guests are memory-starved already, it can't do magic and will incur disk I/O due to thrashing just as hypervisor-based swapping would have done).

All summed up: if you could configure your guests overcommitting just about 10% and they would continue to run without in-guest swapping and the accompanying performance degradations, you likely would be fine with your 40% overcommitment. If not, you definitely would not.

The output of the memory page of esxtop (just press m after starting esxtop from the SSH console) would inform you about the real-time memory statistics in more detail than the graphs you would get with the vSphere client, so it might be worth looking there:

 1:54:52pm up 34 days  8:39, 214 worlds; MEM overcommit avg: 0.00, 0.00, 0.00
PMEM  /MB: 32766   total:  1031     vmk, 29568 other,   2166 free
VMKMEM/MB: 32103 managed:  1926 minfree, 13525 rsvd,  18577 ursvd,  high state
NUMA  /MB:  8123 (  767),  8157 ( 2425),  8157 (  186),  7835 (  128)
PSHARE/MB:  2162  shared,   139  common:  2023 saving
SWAP  /MB:     0    curr,     0 rclmtgt:                 0.00 r/s,   0.00 w/s
ZIP   /MB:    17  zipped,    10   saved
MEMCTL/MB:   295    curr,   292  target, 14289 max