Can the URL be read by third parties when browsing via HTTPS?

We all know that HTTPS encrypts the connection between the computer and the server so that it cannot be viewed by a third party. However, can the ISP or a third party see the exact link of the page the user accessed?

For example, I visit

https://www.website.com/data/abc.html

Will the ISP know that I accessed */data/abc.html or just know that I visited the IP of www.website.com?

If they know, then why does Wikipedia and Google have HTTPS when someone can just read the internet logs and find out the exact content the user viewed?


From left to right:

The schema https: is, obviously, interpreted by the browser.

The domain name www.website.com is resolved to an IP address using DNS. Your ISP will see the DNS request for this domain, and the response.

The path /data/abc.html is sent in the HTTP request. If you use HTTPS, it will be encrypted along with the rest of the HTTP request and response.

The query string ?this=that, if present in the URL, is sent in the HTTP request – together with the path. So it's also encrypted.

The fragment #there, if present, is not sent anywhere – it's interpreted by the browser (sometimes by JavaScript on the returned page).


The ISP will only know you visited the IP address associated with www.website.com (and maybe the URL if you are using their DNS and they are specifically looking for the traffic – if the DNS query does not go through that they won't see that).

(Bear with me a bit here – I do get to the answer.)

The way the HTTP protocol works is by connecting to a port (usually port 80) and then the web browser communicates what page it wants to the server – A simple request to look up http://www.sitename.com/url/of/site.html would have the following lines:

GET /url/of/site.html HTTP/1.1
host: www.sitename.com

HTTPS does exactly the same thing except on port 443 – and it wraps the entire TCP session (i.e., everything you see in the quoted bit above plus the response) into an SSL encrypted session – so the ISP does not see any of the traffic (but they may be able to infer something based on the size of the site, and the DNS lookup to resolve www.sitename.com to an IP address in the first instance).

Of course, if there are "web bugs" embedded in the page, this can give "partners" of the information distributors hints about what you are viewing and who you are – likewise, if your chain of trust is broken, an ISP can perform a man-in-the-middle attack. The reason why you can have private end-to-end encryption, in theory, is because of CA certificates distributed with your browser. If an ISP or government can either add a CA certificate or compromise a CA – and both have happened in the past – you lose your security. I believe that The Great Firewall of China effectively does Man-In-The-Middle attacks to read HTTPS data, but it's been a while since I was there.

You can test this easily enough yourself by getting a piece of software which will sniff the traffic entering and leaving your computer. I believe a free piece of software called Wireshark will do this for you.