What are the detailed OSI model steps involved in connecting to a website?

What are the detailed steps involved in connecting to a website, such as superuser.com home page using a newly connected computer (http, and nothing cached)? What is actually happening in the background to build the final bit that is sent over ethernet?

I understand that, for example, a DNS query is made to resolve the FQDN to an ip address (Layer 7), or the three-way handshake to establish a connection (Layer 4). But how does this happen while the bit is being built? Do the various layers hold that data that will be a part of the final bit, while they themselves send queries/data over ethernet to gather the relevant information/make a connections, etc.? How does this exactly work?

When the OSI model, or TCP/IP model, is discussed, things are generally presented as data sequentially being built and flowing down the layers until it is sent out as a bit, but I haven't been able to find a more detailed explanation as to the details involved with each aspect, in a simple example as connecting to a website.


Apps these days use abstract high-level libraries to address the network, so a lot of it is done automatically by the OS. The lower in the OSI you go, the more automatic it is, and the less coders care about it. Since your question is about data structures and layers, you are really more concerned about the upper layers anyway, as the lower ones are more electrical engineering, firmware, and drivers than anything else. at that layer its just bits or electrical signals.

The application layer does a lot more than is apparent in the OSI model, so the first thing you should understand is that the application layer drives everything. The actual job of creating data structures at layers 3 and 4 is handled by methods (programmed functions) that operate at those layers, but the application layer coordinates each operation, and passes into each method the parameters it needs, so no, the layers don't "hold" their data per se, and things aren't necessarily "passed down" to the subsequent layer (though in some cases, they are, quite literally). Instead think of it as a set of function calls which define a task, such that the output of one function is the input to another. The point is that the locus of control is always at the application layer.

So, as I said in my comment, Most modern apps use a variant of the Berkley Sockets API standard. This library contains methods that operate at OSI layers7, 4, 3, and hooks to OS IP API.

  1. The application will call Sockets.Socket(type) to create a new port, and is returned the new ports number. This is a layer4 function.

  2. The application will ask the OS what its IP address is, and then call Sockets.Bind(newPort, localIPAddr, addrLen) to attach the new socket to the IP interface. This is a layer3 function.

  3. The application will call Sockets.Connect(newPort, remoteAddrandPort, addrlen) to initiate a connection via the TCP Three-way handshake.

  4. After all this is complete, the Application can use the Sockets.Send() and Sockets.Recv() functions to read and write from and to the socket as though it were an IO steam. Internally, Send()/Recv() call private methods defined within the sockets library that encapsulate the data at each layer, using the output of the previous structure as the input to the next lower, until it tells the local IP stack to send the packet. In most cases, applications know or care not a whit about anything below layer 3, and when they do care about layers 3 or 4, its only to provide valid parameter values.

The application is also responsible for protocol command sequences. for instance, to connect to superuser, the app must communicate in HTTP command sequences.

To retrieve the default page here at superuser.com, the browser would construct the sequence:

GET / \HTTP/1.1;

The application can simply write that string to the port, and it will be automatically encapsulated in a TCP segment, and a IP packet, and an 802.11n frame, and get converted into electrical signal by the hardware.

The application can read from the network IO stream to retrieve a response like

200: <!DOCTYPE html> <html itemscope itemtype="http://schema.org/QAPage"> <head> <title>networking - What are the detailed OSI model....

The browser then takes off the 200 (a value indicating that the HTTP command worked, and markup follows), and renders the page.

So, I am not entirely satisfied with this answer, because its taken me years and years both in networking and coding to get a holistic mental picture of how all this works in reality (as opposed to the highly abstract OSI) and I know that there are at least 3 distinct perspectives you can take on looking at network connectivity. In this case signal processing is right out, and it sounds like you are learning about the networking professionals perspective already, so hopefully this perspective will help round out your your understanding of where theory meets reality.

Edit:

Oh, and since you mentioned DNS, most applications use the Sockets getaddrinfo/getnameinfo methods to perform a quick dns query, taking a FQDN as input. These methods internally create, bind(), connect(), encapsulate a UDP datagram (note, DNS us usually performed over UDP, though most systems can be configured to use TCP instead) and send it, listen for a response, parse it into a structure, and return it to the application all with one call. its pretty neat. In fact, now that I think about it, it is the epitome of what Encapsulation means.


I'm kind of ignoring your first paragraph, which was useful, because it sounds like you were trying to get more specific in the second paragraph. So that paragraph is what I answer in detail.

But how does this happen while the bit is being built?

You proposed your own answer, with the next question.

Do the various layers hold that data that will be a part of the final bit, while they themselves send queries/data over ethernet to gather the relevant information/make a connections, etc.?

Yes.

How does this exactly work?

The user told the web browser that information is desired from a website. As the user types this address into the address bar, no networking is involved yet; the OSI Model would consider this to be OSI Model Layer 7: Application Layer.

Web browser specified that a insecure communication is okay. (If security was required, HTTPS would have been done. However, HTTP will work to provide an insecure communication.) So HTTP is how communication should occur (Presentation Layer, Layer 6, still commonly handled by the application). HTTP does not use EBCDEC; communication will be using ASCII (another detail related to Presentation Layer, OSI Model Layer 6.)

Reliable communication should occur. We'll use a session, so conversation will occur over an HTTP "connection" that may involve multiple packets. The idea of having that connection is Session Layer (OSI Model Layer 5)

Transport communications allow multiple conversations (such as multiple simultaneous data transfers) to occur on the same IP address. When there is incoming or outgoing data, these conversations are kept track of by using multiple "port" numbers. The web browser specifies that it wants to have a conversation to www.superuser.com TCP port 80. Specifying the port number is getting into the realm of the Transport Layer (OSI MOdel Layer 4).

The application (the web browser) communicates with the "TCP/IP networking stack", which is typically built into the operating system (these days... in the days of Windows 3.1, you might need to install "Trumpet Winsock", a third party stack, or use Microsoft's stack that could be installed with MS Internet Explorer for Win 3.1).

The networking stack realizes that "www.superuser.com" is a network name. So the "resolver" code is used. This name is not in the "name resolution" ("resolver") cache, and trying to look it up in the "hosts file" doesn't reveal the name. So a DNS query will be sent.

Ah, yes, your question did reference "http" and "DNS", so this answer gets a bit more complicated by looking at both the DNS communication and the HTTP communication. We'll look at the DNS communication first because, well, that is what will happen before the OSI Model Layer 3 has anything to do with any HTTP traffic.

The resolver begins the process of making a DNS communication. The computer is going to receive the response as a DNS datagram (UDP port 53, Transport Layer, Layer 4).

The DNS server is on a computer. We'll pretend like it is on a remote computer. So this is going to involve communicating with a different computer's IP address. So an IP packet is going to be used (that is Network Layer, OSI Model Layer 3). Just for fun, let's say this is an IPv4 packet (no reason why not). (Actually, I started writing this as IPv6... I decided to revert to IPv4 for shorter sample addresses. But IPv6 could be done instead.)

Let's pretend like computer is a router. Based on the Layer 3 IP address, we don't want to take the route that will send the traffic upstairs to the teenager's bedroom. We want to take a route that will go to the Internet. This IPv4 packet could be sent over the wireless network or the wired network. We'll choose to use the IPv4 address that uses the wired network.

Since the DNS server is on a different subnet, we'll need to send the traffic to a gateway. Since I don't have a more specific route (e.g., to the teenager's bedroom), I'll use a "default gateway" that is used whenever there isn't a more specific option available. Knowing which way to send the traffic is "routing", the major feature of Layer 3.

Let's say that the wired network will be used for this communication. The IP packet needs to get to the DNS server (8.8.8.8, Layer 3), but the routing table indicates that such communiations get routed through a gateway address at 198.51.100.1 (Layer 3). (By the way, 198.51.100.1 is not something you should be using on an actual network, but I am allowed to use it for this example, because I'm following RFC 5737 section 3

We can communicate to 198.51.100.1 by using an Ethernet frame. The ARP cache (IPv4's equivilent of IPv6's NDP) doesn't have details, so we'll need an ARP WHO-HAS frame (equivilent to IPv6's neighbor discovery) to figure out where the Ethernet frame must be sent. This neighbor discovery sends out Ethernet broadcast to FF-FF-FF-FF-FF-FF (IPv6 could use multicast as part of NDP) to figure out who has that Ethernet address. When a response is retreived, the information goes into the cache (ARP cache... if we were using IPv6, it would be the NDP cache).

Now we can send an Ethernet frame to the system that is at 192.168.0.1. So the "TCP/IP Networking stack" places the UDP datagram into an IP packet that will go to the IP address of 8.8.8.8, and encapsulates that into an Ethernet frame that goes to 01-23-45-67-89-AB. That Ethernet frame is sent out at Layer 2.

The TCP/IP networking stack sends out that Ethernet frame at layer 2, by communicating with the network card driver (which can communicate with Ethernet). However, the TCP/IP networking stack forgets about the bits in that UDP datagram. After all, UDP is unreliable. The TCP/IP network stack isn't done with the HTTP request, because the "resolver" is still waiting for a response based on the "source" network address of the outgoing UDP packet. But the TCP/IP networking stack doesn't keep a copy of the bits that were unreliably sent off in that UDP datagram. (If the UDP datagram gets lost, I believe the "resolver" will probably fail, and then the web browser might decide to retry. Anyway, the "retry" part is not handled by the unreliable portion of taking care of a UDP datagram.)

The Ethernet driver hangs onto the packet long enough to make sure that the packet doesn't get corrupted by any Ethernet collisions at OSI Model Layer 1. Once the Ethernet is transmitted without problem, the network driver forgets about it.

The default gateway receives the Ethernet frame. Since it is a router, it forwards traffic, which means it needs to look a bit at IP packets that are not addressed to itself. I consider this to be "promiscuous". The router checks to see where the traffic should go, and follows a similar process to get the traffic to another router. The IP packet gets modified by having the TTL reduced by 1, and the router uses Layer 2 to get the traffic to the next router. That process repeats through as many routers as necessary, and that ought to work okay as long as the TTL level doesn't get to low, in which case an ICMP "TTL Exceeded" reply will come back. For simplicity, the rest of this example will pretend like that did not happen.

Later on, perhaps after many thousands of milliseconds which take up millions of megahertz of CPU time, the network driver (on the computer with the web browser) notices an Ethernet communication. That Ethernet frame has a destination MAC address (OSI Model Layer 2) that belongs to this computer with the web browser. The frame has a Protocol field that says it is an IP packet; specifically the term "IP packet" is from an old standard, and means an IPv4 packet (OSI Model Layer 3). Since the destination address matches this computer, the computer doesn't need to check if there is any software running in "promiscuous mode". So the network driver sends it to the TCP/IP networking stack. The IP packet ends up containing a UDP datagram (OSI Model Layer 4) from the DNS server. So the TCP/IP networking stack checks the list of open ports (which you can see by running "netstat -na" in either Unix or Microsoft Windows). The list of open ports is checked for a "LISTENING" port, and it turns out that a response is being looked for by the resolver. So the TCP/IP networking stack sends this UDP datagram to the resolver.

Now that the resolver has figured out that www.superuser.com is 203.0.113.50 (as an example, permitted by RFC 5737 section 3), the TCP/IP networking stack can feel free to make a TCP segment that will contain an IP packet that goes to 203.0.113.50. The first IP packet of the conversation doesn't really contain any interesting payload, and is just part of the 3-way TCP handshake. After a reply, the TCP-handling portion of the TCP/IP networking stack will send a TCP segment inside an IP packet. The process is pretty much similar to the handling of the UDP datagram, except that when the TCP/IP Networking stack takes IP packet containing TCP segment and sends those packets to the to the network driver (to handle the Ethernet frame), the TCP/IP networking stack will remember the entire contents of that TCP packet, until the packet is acknowledged in a reply TCP segment. If the TCP packet gets lost in transit, eventually to remote end will complain or an expiration timer will complete, and the TCP/IP packet will send another TCP segment with a duplicate copy of the essential payload. This attempt to "retry" is why TCP is called "reliable".

This time, instead of waiting for a UDP datagram containing DNS traffic that gets sent to the resolver, the TCP/IP Networking Stack waits for a TCP reply. Some random port, e.g. port 12345, is used as the "source port" of the initial request.

The outgoing TCP segment contains the "GET" request that is part of the HTTP communication sent by the web browser.

Now, let's fast-forward through the handling of the IP packet (and the Ethernet frame).

After the request is received by the webserver, the webserver will send data to the web browser. That may happen as multiple TCP segments. The web server remembers the contents of every TCP segment it sends, until that TCP segment gets acknowledged by the computer that is running the web browser.

As the computer with the web browser gets information from the web server, it notices Ethernet frames (OSI Layer 2) that contain IP packets (OSI Layer 3) which contain TCP segments (OSI Layer 4) that come from TCP port 80 (on the web browser) to a local TCP port that is listening (e.g., 12345, mentioned earlier). The TCP/IP networking stack will realize that should go to the web browser.

The web browser processes the information from the connection (Layer 5, session), realizes that the traffic is unencrypted (Layer 6, presentation), and does not make the address bar red (like it would if there was a problem with HTTPS security). Deciding on the color of the address bar is a "user interface" issue, which is considered to be part of Layer 7 of the 7-layer OSI Model.