How much information can websites get about your browser/PC?

I am trying to determine if the information shown on www.whatsmyip.org is the absolute maximum amount of information that a webserver can obtain from a web visitor. Are there other sites that will be able to get more information from the user passively like this?

I'm not talking about port-sniffing or any kind of interaction from the user, just the information that a server can get from a 'dumb' visit.


This question was a Super User Question of the Week.
Read the blog entry for more details or contribute to the blog yourself


Solution 1:

There is more: the Electronic Frontier Foundation (EFF) brought out a tool called Panopticlick which shows mostly the same information but additionally scans your installed fonts.

Installed fonts are probably the most identifying piece of information as soon as you start adding one or two. Just because of the amount of fonts out there, it is unlikely to have the same set of fonts on two different computers. (As long as they are used by different persons)

Edit (from comments): A countermeasure to this is either disabling JavaScript (through an addon like NoScript for example) or to disable both Java and Flash plugins in the browser, as at least one of them is needed to extract the information.

Solution 2:

How do they get it?

Passive identifiable information is mostly collected from headers of the communication packets.

When a browser requests a URL, this requests undergoes through several layers of OSI model and several network protocols. The upper level protocols such as HTTP and TCP/IP probably provide most of the information displayed on that web site. This information is usually stored in a packet header and was originally embedded there to help servers understand: what is the best representation of the information for your environment.

A user-friendly list of current HTTP headers is available from Wikipedia. A more technical reference is RFC 2616 Header Field Definitions or RFC 2616 itself, see section 14.

How to protect your privacy?

Another very popular technique to track a user is via specific cookie - this is how ad providers know which ad to show you (which makes me very wary). See answers for my question: How to remove tracking cookies. Answers actually cover a lot more possible defences against other tracking techniques.

Perhaps a more secure way to stay anonymous online is to use some dedicated security projects, one of which is TOR.

Solution 3:

In terms of information you can obtain passively without using Java/Flash - that's pretty exhaustive.

You could perhaps do things like estimate PC performance using a JavaScript benchmark, but you're really pushing at that point.

Solution 4:

That page doesn’t really show much if you simply deny the browser prompts to run plugins, allow location detection, etc.

The hostname, IP address, etc. can be easily hidden via a proxy, and browser/OS information can easily be spoofed via extensions and such.

In the end, unless you install and allow third-party plugins, web-sites cannot gather much information because browsers are specifically designed to limit how much access they have to a system. The most common tool that sites use to collect data is cookies, but there’s limits to how much they can report as well.

The only real way for a site to get unfettered access to your system is to try to exploit a vulnerability in the browser or one of its plugins, but you can mitigate even that by installing as little as possible and keeping them updated.

Solution 5:

There is something extra what the previous answers don't list:

A website can track which other websites you have visited (before the last time you erased your browsing history).

How is it done?

Your browser colors links differently, based on whether you visited them before, or not. A website can make a big list of a lot of well-known websites (of which the site wants to know if you visited them), and display that list in a way the user cannot see it (hidden behind an image, with a font size of 1 pixel, with the same color as the background, etc.) Now a script scans how the list is "displayed" by the browser, and can know which of them were visited.