Why do x86 CPUs only use 2 out of 4 rings?

There are two primary reasons.

The first reason is that, although the x86 CPUs do offer four rings of memory protection, the granularity of protection offered thereby is only at the per-segment level. That is, each segment can be set to a specific ring ("privilege level") from 0 to 3, along with other protections like write-disabled. But there are not that many segment descriptors available. Most operating systems would like to have a much finer granularity of memory protection. Like... for individual pages.

So, enter protection based on page table entries (PTEs). Most if not all modern x86 operating systems more or less ignore the segmenting mechanism (as much as they can, anyway) and rely on PTE-based protection. This is spec'd by flag bits which are the lower 12 bits in each PTE - plus bit 63 on CPUs that support no-execute. There is one PTE for each page, which is normally 4K.

One of those flag bits is called the "privileged" bit. This bit controls whether or not the processor has to be in one of the "privileged" levels to access the page. The "privileged" levels are PL 0, 1, and 2. But it's just one bit, so at the page-by-page protection level the number of "modes" available as far as memory protection is concerned is just two: A page can be accessible from non-privileged mode, or not. Hence just two rings.

To have four possible rings for each page, they would have to have two protection bits in each page table entry, to encode one of four possible ring numbers (just as do the segment descriptors). They don't.

The second reason is the goal of OS portability. It's not just about x86; Unix taught us that an OS could be relatively portable to multiple processor architectures, and that that was a good thing. And some processors support only two rings. By not depending on multiple rings in the architecture the OS implementers made the OSs more portable.

There is a third reason that is specific to Windows NT development. NT's designers (David Cutler and his team, whom Microsoft hired away from DEC Western Region Labs) had had extensive previous experience on VMS; in fact, Cutler and a few of the others were among VMS's original designers. And the VAX processor for which VMS was designed (and vice versa) does have four rings. VMS uses four rings. (In fact the VAX has four protection bits in the PTE, allowing combinations like "read-only from user mode, but writeable from ring 2 and inner." But I digress.)

But the components that ran in VMS's rings 1 and 2 (Record Management Services and the CLI, resp.) were left out of the NT design. Ring 2 in VMS really wasn't about OS security but rather about preserving the user's CLI environment from one program to the next, and Windows NT just didn't have that concept; the CLI runs as an ordinary process. As for VMS's ring 1, the RMS code in ring 1 had to call into ring 0 fairly often, and ring transitions are expensive. It turned out to be far more efficient to just go to ring 0 and be done with it rather than have a lot of ring 0 transitions within the ring 1 code. (Again - not that NT has anything like RMS anyway.)

But why are they there, then? As for why x86 implemented four rings while OSs didn't use them - you're talking about OSs of far more recent design than x86. A lot of the "system programming" features of x86 were designed long before NT or true Unix-ish kernels were implemented on it, and they didn't really know what the OSs would use. (It wasn't until we got paging on x86 - which didn't show up until the 80386 - that we could implement true Unix-ish or VMS-like kernels without rethinking memory management from the ground up.)

Not only do modern x86 OSs largely ignore segmenting (they just set up the C, D, and S segments with base address 0 and size of 4 GB; F and G segments are sometimes used to point to key OS data structures), they also largely ignore things like "task state segments". The TSS mechanism was clearly designed for thread context switching but it turns out to have too many side effects, so modern x86 OSs do it "by hand". The only time x86 NT changes hardware tasks, for example, is for some truly exceptional conditions, like a double fault exception.

Re x64, a lot of these disused features were left out. (To their credit, AMD actually talked to OS kernel teams and asked what they needed from x86, what they didn't need or didn't want, and what they'd like added.) Segments on x64 exist only in what might be called vestigial form, task state switching doesn't exist, etc. And OSs continue to use just two rings.