How does the HTTP GET method work in relation to DNS protocol?

I am trying to understand application layer protocols in TCP/IP stack. I know that both HTTP and DNS protocol stay at the top layer (Application Layer). So, when a browser wants to access a resource, it has to send a request to the HTTP server, as for example:

GET www.pippo.it/hello.htm HTTP/1.1

Making this request following the rules of HTTP protocol, it uses the page URL, not the IP address.

I know that DNS request is necessary to convert URL to IP. So my question is: does HTTP invoke the DNS protocol? It seems impossible to me, since both are top layer protocols (so DNS can't provide a service to HTTP). In the same way even TCP (which stays on a lower level) can't ask for a service at a higher level protocol like DNS.

So when does the DNS request happen? And who performs such a request?


Solution 1:

The HTTP request in question is actually not valid unless the browser is talking to an intermediary (proxy).

Your example would look a bit more like the following if the browser was talking with a web server directly:

GET /hello.htm HTTP/1.1
Host: www.pippo.it

Now, to put this in perspective, consider the OSI model:

The OSI model

We have 3 systems in action:

  • A client running the browser
  • A web server serving the site
  • A DNS server knowing the IP address of the site

The protocols involved are, bottom to top (minimum relevant set to OP):

  • IP
  • TCP, UDP
  • HTTP, DNS

The HTTP communication is done over the TCP protocol (TCP is on top of the IP protocol) while the DNS communication, in this case, is done over the UDP protocol (UDP is also on top of the IP protocol).

Here's the communication sequence in short:

  1. The client, running the browser, asks the DNS server for an A record for www.pippo.it, using the UDP protocol.

    1.1. On the client, it's the operating system that does the resolving part and talks back to the browser --- the browser never talks to the DNS server directly, rather through the OS by invoking gethostbyname() or the newer getaddrinfo(). On Windows, the order in which the OS resolves addresses is likely defined by something like this, while on Linux the resolving precedence is defined by /etc/nsswitch.conf

  2. The DNS server, using the UDP protocol, responds to the client with a record/IP address, if it exists

  3. The client opens a TCP connection on the port 80 of the web server and writes the following text:

HTTP request:

    GET /hello.htm HTTP/1.1
    Host: www.pippo.it

You could mimic the same thing by doing something like this in your console or command prompt:

    > telnet www.pippo.it 80
    Trying 195.128.235.49...
    Connected to www.pippo.it.
    Escape character is '^]'.
    GET /hello.htm HTTP/1.1
    Host: www.pippo.it

followed by two empty lines. If the requested content exists, the web server will print it on the screen. If there's a browser on the other side, the response text gets parsed by the browser, and all tags, links, scripts and images are rendered in what we call a web page.

In reality there are some more details, e.g. browsers may cache IP addresses if you already visited some domain, so that DNS resolving becomes unnecessary. Also, modern browsers may try to do the resolving before you actually need it (DNS prefetching) to speed up your browsing.

Additionally, your computer may have static records in a hosts file. If a record matches the request, the local static entry gets used first and no DNS server is ever contacted. This is configurable, and not necessarily true, but it's the default on the operating systems I'm familiar with.

Solution 2:

HTTP is transported over TCP, which is a IP protocol. To make an HTTP request, the browser has to open a TCP connection, and do to that, it needs the destination IP address (i.e. the IP address of the server). To resolve the server's hostname, it has thus to issue a DNS request (generally the DNS request itself is sent by the operating system when a program calls its name resolution functions; however, nothing prevents a program from sending DNS requests by itself to the DNS server). Once the connection is established, it can send its HTTP request, which contains the path to the requested resource, and a Host field with the hostname of the server (e.g., Host: www.pippo.it). The hostname does not go on the request line (it would actually be GET /hello.htm HTTP/1.1), except when the request is sent to an HTTP proxy (and in this case, the full URL is present, including the protocol part, e.g. GET http://www.pippo.it/hello.htm HTTP/1.1),