How do web servers know whether you're using direct IP address access?
To answer your question of how it knows, it has to do with what your browser sends the server.
You're right that the system always resolves it to an IP address, but the browser sends the URL you attempted to access in the HTTP header.
Here is a sample header that I found online, modified to look as though you used Firefox on Windows and typed apple.com
into the address bar:
GET / HTTP/1.1
Host: apple.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Here's what the header would look like if you used its IP address:
GET / HTTP/1.1
Host: 17.142.160.59
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Both of these would be sent to the same IP address over a socket, but the browser tells the server what it accessed.
Why? Because web servers with the same IP address may host multiple sites and give different pages for each. It cannot distinguish who wants which page by IP address because they all have the same one - but it can distinguish them by the HTTP header.
With the HTTP 1.1 protocol (the prior HTTP 1.0 version has been obsolete for quite some time, so is unlikely to be used by any recent version of a browser), the host
header was introduced. For HTTP 1.1 that is a required header line that must be issued by a browser. The domain name is included by the browser in that line, e.g. Host: example.com
. So the web server knows which web site the browser wants to access from that line. Since a webserver may be supporting dozens of websites, that line is important to it to determine which web site the requested page resides on. Supposing the browser wants to access the home page for a site on example.com, It issues the following line to the server when it connects to the server:
GET / HTTP/1.1
That line specifies the browser wishes to get the root document, i.e., "/" for the website. If you wanted to access /somedir/testpage.html
, GET /somedir/testpage.html
would be in the "get" line. The line will be followed by the line below:
Host: example.com
So if the web server is supporting the websites example.com, someothersite.com, yetanothersite.org, etc., it knows that it should return the main page for example.com. If it doesn't get that line, or doesn't have a domain name listed in the Host
line, it doesn't know which website's home page should be returned. So it may return an error message, instead, or return the home page for a "default" site for the server.
You can issue the same commands a browser issues using the telnet protocol, e.g., telnet example.com 80
from a Linux shell prompt or an Apple OS X Terminal window, to connect to the default HTTP port, port 80 - see Testing access to a website using PuTTY for steps to do so with PuTTY on a Windows system.
This is due to the Host:
HTTP header. This is quite useful for hosting multiple sites on the same IP address. For example, http://www.k7dxs.net/ and http://www.philipgrimes.com/ are both on the same IP address. However, because of the Host:
header, they can show two different sites.
For HTTPS, as @Toothbrush pointed out, they use TLS Server Name Indication because the Host header is part of the encrypted request, and the server doesn't know which cert to offer without this.
Fun experiment: Get Tamper Data for Firefox (I haven't been able to find an equivalent for Chrome) and start tampering. Open http://slipstation.com/ and edit the Host:
header in the request to be http://www.zombo.com/. You'll see a possibly familiar website where anything is possible.
The web server can be configured to only accept connections to a particular domain or subdomain. It could be hosting multiple domains.
What the web server does when a direct IP address is used is configurable. In the case of Apache, it will by default go to the first named vhost out of the enabled sites, which are sorted alpha-numerically.
This is the most relevant part of the Apache documentation that I have found, after a quick search:
https://httpd.apache.org/docs/current/vhosts/name-based.html