The difference between Call Gate, Interrupt Gate, Trap Gate?

I am studying Intel Protected Mode. I found that Call Gate, Interrupt Gate, Trap Gate are almost the same. In fact, besides that Call Gate has the fields for parameter counter, and that these 3 gates have different type fields, they are identical in all other fields.

As to their functions, they are all used to transfer code control into some procedure within some code segment.

I am wondering, since these 3 gates all contain the information needed for the call across privilege boundaries. Why do we need 3 kinds of them? Isn't 1 just good enough?

Thanks for your time and response.

Update 1

A related question: When to use Interrupt Gate or Trap Gate?

Update 2

Today I came up with this thought:

Different purpose, different gates, and with different CPU behavior details carried out. Such as IF flag handling.

A gate (call, interrupt, task or trap) is used to transfer control of execution across segments. Privilege level checking is done differently depending on the type of destination and instruction used.

A call gate uses the CALL and JMP instructions. Call gates transfer control from lower privilege code to higher privilege code. The gate DPL is used to determine what privilege levels have access to the gate. Call gates are (or have been, probably) gradually abandoned in favour of the SYSENTER/SYSEXIT mechanism, which is faster.

Task gates are used for hardware multitasking support. A hardware task switch can occur voluntarily (CALL/JMP to a task gate descriptor), or through an interrupt or an IRET when the NT flag is set. It works the same way with interrupt or trap gates. Task gates are not used, to the best of my knowledge, as kernels usually want extra work done when task switching.

Interrupt & trap gates, together with task gates, are known as the Interrupt Descriptor Table. They work the same as call gates, except the transfer of parameters, from one privilege stack to another. One difference is that interrupt gates clear the IF bit in EFLAGS, while trap gates do not. This makes them ideal for serving hardware interrupts. Traps are widely used in hardware-assisted virtualization.

For more information, see the Intel Architecture Manuals on the processors that interest you.

Update

To answer the comment:

There are many reasons to distinguish interrupts from traps. One is the difference in scope: interrupt gates point to kernel space (after all, it's the kernel who manages the hardware) while traps are called in userspace. Interrupt handlers are called in response to hardware events, while traps are executed in response to an CPU instruction.

For a simple (but impractical) example to better understand why interrupt and trap gates treat EFLAGS differently, consider what would happen in case we were writing an interrupt handler for hardware events on a uniprocessor system and we couldn't clear the IF bit while we were serving one. It would be possible for a second interrupt to arrive while we were busy serving the first. Then our interrupt handler would be called by the processor at some random point during our IH execution. This could lead to data corruption, deadlocking, or other bad magic. Practically, interrupt disabling is one of the mechanisms to ensure that a series of kernel statements is treated like a critical section.

The above example is assuming maskable interrupts, though. You wouldn't want to ignore NMIs, anyway.

It's largely irrelevant today, too. Today there's practically no distinction between fast and slow interrupt handlers (search for "Fast and Slow Handlers"), interrupt handlers can execute in nested fashion, SMP processors make it mandatory to couple local interrupt disabling with spin locks, and so forth.

Now, trap gates are indeed used to service software interrupts, exceptions, etc. A page fault or division by zero exception in your processor is probably handled through a trap gate. The simplest example of using trap gates to control program execution is the INT 3 instruction, which is used to implement breakpoints in debuggers. When doing virtualization, what happens is that the hypervisor runs in ring 0, and the guest kernel usually in ring 1 - where privileged code would fail with general exception fault. Witchel and Rosenblum developed binary translation, which is basically rewriting instructions to simulate their effects. Critical instructions are discovered and replaced with traps. Then when the trap executes, control is yielded to the VMM/hypervisor, which is responsible for emulating the critical instructions in ring 0.

With hardware-assisted virtualization, the trap-and-emulate technique has been somewhat limited in its use (since it's quite expensive, especially when it's dynamic) but the practice of binary translation is still widely used.

For more information, I'd suggest you check out:

Linux Device Drivers, Third Edition (available online)
For binary translation, QEMU is an excellent start.
Regarding trap-and-emulate, check out a comparison between software/hardware techniques.

Hope this helps!

Architecture and Design

From the point of view of protection, the x86 architecture is based on hierarchical rings, according to which all execution space delivered by processor is divided into four hierarchical protection domains, each of which have its own level of privileges assigned. This design assumes that most of the time code will be executed in the least privileged domain and sometimes services from the more privileged security domain will be requested and this services will preempt the less privileged activities onto the stack and then restore it in such a way that whole preemption will be invisible for the less privileged code.

The design of hierarchical protection domains state that the control can't be passed arbitrary between different security domains.

A gate is a feature of x86 architecture for control transfer from less privileged code segments to more privileged ones, but not vice versa. Furthermore, the point in the less privileged segment from where control will be passed can be arbitrary, but point in the more privileged segment to where control will be passed is strictly specified. Backward control passing to the less privileged segment is allowed only by means of IRET instruction. In this regards Intel Software developer manual claims:

Code modules in lower privilege segments can only access modules operating at higher privilege segments by means of a tightly controlled and protected interface called a gate. Attempts to access higher privilege segments without going through a protection gate and without having sufficient access rights causes a general-protection exception (#GP) to be generated.

In other words, a gate is a privileged domain entry point with required access rights and a target address. In that way all gates are similar and used for almost same purposes, and all gate descriptors contain DPL field, that used by processor to control access rights. But note, the processor checks the DPL of the gate only if the source of the call was a software CALL, JMP, or INT instruction, and bypasses this check when the source of the call is a hardware.

Types of Gates

Despite the fact that all gates are similar, they have some differences because originally Intel engineers thought that different gates would be used for different purposes.

Task Gate

A Task Gate can be stored only in IDT and GDT and called by an INT instruction. It is very special type of gate that differs significantly from the others.

Initially, Intel engineers thought that they would revolutionize multitasking by providing CPU based feature for task switching. They introduced TSS (Task State Segment) that hold registers state of the task and can be used for hardware task switching. There are two ways for triggering hardware task switching: by using TSS itself and by using Task Gate. To make hardware task switch you can use the CALL or JMP instructions. If I correctly understand, the main reason for the task gate introduction was to have the ability to trigger hardware task switches in reply to the interrupt arrival, because a hardware task switch can't be triggered by a JMP to the TSS selector.

In reality, no one uses it nor the hardware context switching. In practice, this feature isn't optimal from the performance point of view and it isn't convenient to use. For example, taking into the account that TSS can be stored only in GDT and length of GDT can't be more then 8192, we can't have more then 8k tasks from the hardware point of view.

Trap Gate

A Trap Gate can be stored only in IDT and called by an INT instruction. It can be considered as a basic type of gate. It just passes control to the particular address specified in the trap gate descriptor in the more privileged segment and nothing more. Trap gates actively used for different purposes, that may include:

system call implementation (for example Linux use INT 0x80 and Windows use INT 0x2E for this purposes)
exception handling implementation (we haven't any reason to disable interrupts in the case of exception).
interrupt handling implementation on machines with APIC (we can control kernel stack better).

Interrupt Gate

An Interrupt Gate can be stored only in IDT and called by an INT instruction. It is the same as trap gate, but in addition interrupt gate call additionally prohibits future interrupt acceptance by automatic clearing of the IF flag in the EFLAGS register.

Interrupt gates used actively for interrupt handling implementation, especially on PIC based machines. The reason is a requirement to control stack depth. PIC doesn't have the interrupt sources priorities feature. Due to this by default, PIC disable only the interrupt that already on handling in processor. But another interrupts still can arrive in the middle and preempt the interrupt handling. So there is can be 15 interrupt handlers on the kernel stack in the same moment. As result kernel developers forced either to increase kernel stack size significantly that leads to the memory penalty or be ready to faced with sporadic kernel stack overflow. Interrupt Gate can provide guarantee that only one handler can be on the kernel stack in the same time.

Call Gate

A Call Gate can be stored in GDL and LDT and called by CALL and JMP instructions. Similar to trap gate, but in addition can pass number of parameters from the user mode task stack to kernel mode task stack. Number of parameters passed is specified in the call gate descriptor.

Call gates were never popular. There are few reasons for that:

They can be replaced by trap gates (Occam's Razor).
They not portable a lot. Other processors have no such features which means that support of call gates for system calls is a burden when porting the operating system as those calls must be rewriten.
They're not too flexible, due to the fact that amount of the parameters that can be passed between stacks is limited.
They're not optimal from the performance point of view.

At the end of 1990s Intel and AMD introduced additional instructions for system calls: SYSENTER/SYSEXIT (Intel) and SYSCALL/SYSRET (AMD). In contrast to the call gates, the new instructions provide a performance benefits and have found adoption.

Summary

I disagree with Michael Foukarakis. Sorry, but there are no any differences between interrupts and traps except affecting IF flag.

In theory, each type of gate can serve as interface pointing to a segment with any level of privileges. On practice, in modern operating system in use only interrupt and trap gates, that is used in IDT for system calls, interrupts and exception handling and due to this all they serve as kernel entry point.
Any type of gate (including interrupt, trap and task) can be invoked in software by using an INT instruction. The only feature that can prohibit user mode code access to particular gate is DPL. For example, when operating system builds IDT, regardless of the types of the particular gates, kernel setup DPL of the gates that will be used for hardware event handling to 0 and according to this access to this gates will be allowed only from the kernel space (that runs at most privileged domain), but when it setup gate for the system call, it set DPL to 3 to allow access to that gate from any code. In result, user mode task is able to make system call using gate with DPL = 3, but will catch General Protection Fault on attempt to call keyboard interrupt handler, for example.
Any type of gate in IDT can be invoked by hardware. People use interrupt gates for this hardware events handling only in cases when they want to achive some synchronization. For example to be sure that kernel stack overflow is impossible. For example, I have successfull experience of trap gates usage for hardware interrupt handling on the APIC based system.
Similar way, gate of any type in IDT can be called in software. The reason for the using trap gates for system call and exceptions is simple. No any reasons to disable interrupts. Interrupt disabling is a bad thing, because it increases interrupt handling latencies and increase probability of interrupt lost. Due to this no one won't disable them without any serious reason on the hands.
Interrupt handler usually written in strict reentrant style. In this way interrupt handlers usually share no data and can transparently preempt each other. Even when we need to mutually exclude concurrent access to the data in interrupt handler we can protect only the access to the shared data by using cli and sti instructions. No any reason to consider an whole interrupt handler as a critical section. No any reason to use interrupt gates, except desire to prevent possible kernel stack overflow on the PIC based systems.

The trap gates is a default solution for kernel interfacing. Interrupt gate can be used instead of trap gate if there is some serious reason for that.

An Interrupt gate is special because the IF flag is automatically cleared. A Call gate is special because it doesn't get activated through an interrupt vector. A task gate is special because it automatically saves the processor state. Four distinct behaviors, having four names for them is convenient.