Why do VirtualBox guest kernels run in ring 1 instead of ring 3?

Solution 1:

I got some very helpful answers from the folks at #vbox-dev on freenode as well as other online resources.

  1. It doesn't improve performance. As mentioned in the VirtualBox documentation, guest user space runs in ring 3 and guest kernel space runs in ring 1. This allows the guest kernel space to be protected from the guest user space through pagination (see slide 19). The following explains how pagination is used to achieve this protection.

    https://manybutfinite.com/post/cpu-rings-privilege-and-protection/

    Each memory page is a block of bytes described by a page table entry containing two fields related to protection: a supervisor flag and a read/write flag. The supervisor flag is the primary x86 memory protection mechanism used by kernels. When it is on, the page cannot be accessed from ring 3. While the read/write flag isn't as important for enforcing privilege, it's still useful.

  2. The good news is that guests cannot execute privileged instructions since only ring 0 can do so. The bad news is that on a 64-bit system, ring 1 potentially has access to the host's memory pages. This is because in 64-bit mode , segment limits no longer apply since segmentation has been mostly replaced with paging. Unfortunately paging does not distinguish between privilege levels 0-2 when it comes to memory isolation. This issue is known as ring compression (see slide 19).

    https://cseweb.ucsd.edu/~jfisherogden/hardwareVirt.pdf

    Ring Compression

    To provide isolation among virtual machines, the VMM runs in ring 0 and the virtual machines run either in ring 1 (the 0/1/3 model) or ring 3 (the 0/3/3 model). While the 0/1/3 model is simpler, it can not be used when running in 64 bit mode on a CPU that supports the 64 bit extensions to the x86 architecture (AMD64 and EM64T).

    To protect the VMM from guest OSes, either paging or segment limits can be used. However, segment limits are not supported in 64 bit mode and paging on the x86 does not distinguish between rings 0, 1, and 2. This results in ring compression, where a guest OS must run in ring 3, unprotected from user applications.

    The above paragraph suggests that on 64-bit systems, due to segmentation being dropped, both the guest kernel and guest userspace must run in ring 3 (0/3/3 model) in order to protect the host from the guest. However see slide 37 suggests that it could be possible to maintain the 0/1/3 model and prevent ring 1 from accessing the host through very complex Binary Translation (BT). Perhaps this is the strategy that VirtualBox implements?

It's important to remember that this whole discussion only pertains to full software virtualization and is therefore very much outdated since very few CPUs don't support hardware virtualization. As someone from #vbox-dev pointed out.

software virtualization is a dying species, though. so few CPUs left without hardware virtualization support. At some point we'll have to make a tough decision - keeping code alive costs time and money.