How do Ctrl+C and Ctrl+V work?

In Windows, the clipboard API and storage buffer are provided by the OS at kernel level. (The clipboard belongs to the "window station" kernel object.)

  • Ctrl+C tells the program to store the "copied" data using the Win32 API function SetClipboardData(), which also corresponds to NtUserSetClipboardData() in the Native API.

    Normally, the copied data is immediately stored in the OS-managed clipboard buffer and no longer depends on the source program. The program can provide several different formats – e.g. text copied from MS Word can come in HTML, RTF, and plaintext formats simultaneously.

    (However, the program may store 'null' data and defer conversion until a paste is requested using WM_RENDERFORMAT. In this case data is lost when the program is closed. I'm not sure how common this method is.)

  • Ctrl+V tells the program to choose the desired format and retrieve it using GetClipboardData(). Some programs, e.g. WordPad or Paint, have a "Paste as" feature that lets you choose the preferred format (e.g. if you copied HTML but don't want formatting).

  • See also "NT Debugging: How the clipboard works" blog post.

In Linux, there is no system-wide clipboard, instead it is provided by whatever graphical environment you're using (that is, X11, Wayland, none):

In X11 (not necessarily limited to Linux), clipboard transfer is deferred. That is, the storage is provided by the 'source' program. The exchange is done via X11 messages according to the ICCCM protocol (therefore the clipboard is isolated to the X server):

  • Ctrl+C tells the source program to reserve the "copied" data in its own memory, and to claim the X11 selection called 'CLIPBOARD'. This is done using XSetSelectionOwner(), and the ownership is tracked by the X server (Xorg).

    If you had copied something previously, the previous selection owner is informed about this and discards the now-unneeded data.

    If you close the program, the copied data is lost. (Clipboard managers can be used to avoid this by watching and duplicating the current selection.)

  • Ctrl+V tells the destination program to look up the current owner of the 'CLIPBOARD' selection using XGetSelectionOwner(), then directly asks it for the preferred type using XConvertSelection(). The source program then returns the data via another X11 message, converted on-demand to the type that was requested. (There is also a special type which returns a list of possible types.)

  • See this link for a practical example.

(Note: When you "copy" text by selecting it and paste it using middle-click, the mechanism is the same but the 'PRIMARY' selection is used instead. This is where the term 'X11 selection' comes from.)

In Wayland – I don't actually understand how it works, all I have is the protocol docs:

  • https://wayland.freedesktop.org/docs/html/ch04.html#sect-Protocol-data-sharing

  • https://wayland.freedesktop.org/docs/html/apa.html#protocol-spec-wl_data_device

  • See https://github.com/bugaevc/wl-clipboard for a command-line tool.

Traditional text editors (Vim, emacs, nano) often have their own internal clipboards (aka registers/killrings). They might, or might not, also integrate with the X11 clipboard.

In macOS, something called a "pasteboard server" appears to be used (which I think means that programs communicate with it through Mach APIs). Other than that, it behaves like the Windows clipboard and stores the currently copied data itself.

  • https://developer.apple.com/documentation/appkit/nspasteboard

  • There is a sample application: https://developer.apple.com/library/archive/samplecode/ClipboardViewer/Introduction/Intro.html

I'm more intrigued with images, how can they be copied so easily

Images are just chunks of binary data, like text or audio. The clipboard will usually hold an image in formats common to the OS, e.g. BMP on Windows, image/png on Linux.

And on Linux, the X11 clipboard protocol actually allows one X client to offer different "encodings" of the selection to other clients, and the receiving X client can choose the encoding, so the sending X client gets to convert the selection into whatever format both clients understand. At least in theory; not sure if modern toolkit libraries like GTK or Qt do offer much choice.