What is Sec-WebSocket-Key for?

In section 1.3 "Opening Handshake" of draft-ietf-hybi-thewebsocketprotocol-17, it describes Sec-WebSocket-Key as follows:

To prove that the handshake was received, the server has to take two pieces of information and combine them to form a response. The first piece of information comes from the |Sec-WebSocket-Key| header field in the client handshake:

Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==

For this header field, the server has to take the value (as present in the header field, e.g. the base64-encoded [RFC4648] version minus any leading and trailing whitespace), and concatenate this with the Globally Unique Identifier (GUID, [RFC4122]) "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" in string form, which is unlikely to be used by network endpoints that do not understand the WebSocket protocol. A SHA-1 hash (160 bits), base64-encoded (see Section 4 of [RFC4648]), of this concatenation is then returned in the server's handshake [FIPS.180-2.2002].

Here's the thing I can't understand: why not simply return code 101? If the proper use of Sec-WebSocket-Key is for security, or to prove they can handle websocket requests, then any server could return the expected key if they wanted to, and pretend they are a WebSocket server.


Solution 1:

According to RFC 6455 Websocket standard

first part:

.. the server has to prove to the client that it received the
client's WebSocket handshake, so that the server doesn't accept
connections that are not WebSocket connections.  This prevents an
attacker from tricking a WebSocket server by sending it carefully
crafted packets using XMLHttpRequest [XMLHttpRequest] or a form
submission.

...
For this header field, the server has to take the value (as present
in the header field, e.g., the base64-encoded [RFC4648] version minus
any leading and trailing whitespace) and concatenate this with the
Globally Unique Identifier (GUID, [RFC4122]) "258EAFA5-E914-47DA-
95CA-C5AB0DC85B11" in string form, which is unlikely to be used by
network endpoints that do not understand the WebSocket Protocol.

second part:

The |Sec-WebSocket-Key| header field is used in the WebSocket opening
handshake.  It is sent from the client to the server to provide part
of the information used by the server to prove that it received a
valid WebSocket opening handshake.  This helps ensure that the server
does not accept connections from non-WebSocket clients (e.g., HTTP
clients) that are being abused to send data to unsuspecting WebSocket
servers.

So, as the value of the GUID is specified in the standard, it is unlikely (possible, put with very small probability) that the server which is not aware of Websockets will use it. It does not provide any security (secure websockets - wss:// - does), it just ensures that server understands websockets protocol.

Really, as you've mentioned, if you are aware of websockets (that's what to be checked), you could pretend to be a websocket server by sending correct response. But then, if you will not act correctly (e.g. form frames correctly), it will be considered as a protocol violation. Actually, you can write a websocket server that is incorrect, but there will be not much use in it.

And another purpose is to prevent clients accidentally requesting websockets upgrade not expecting it (say, by adding corresponding headers manually and then expecting smth else). Sec-WebSocket-Key and other related headers are prohibited to be set using setRequestHeader method in browsers.

Solution 2:

Mostly for cache busting.

Imagine a transparent reverse-proxy server watching HTTP traffic go by. If it doesn't understand WS, it could mistakenly cache a WS handshake and reply with a useless 101 to the next client.

Using a high-entropy key and requiring a basic challenge-response rather specific to WS ensures the server actually understood this was a WS handshake and in turn tells the client that the server will indeed be listening on the port. A caching reverse-proxy would never implement that hashing logic "by mistake".

Solution 3:

I'm inclined to agree.

Nothing of importance would change if the client ignored the value of the Sec-WebSocket-Accept header.

Why? Because the server is not proving anything by doing this calculation (other than that it has the code to do the calculation). Just about the only thing it rules out is a server that simply replies with a canned response.

The exchange of headers (e.g. with fixed 'key' and 'accept' values) is already sufficient to rule out any accidental connection with something that is not at least trying to be a WebSocket server; and if it's trying, the requirement that it do this calculation is hardly an impediment to its succeeding.

The RFC claims:

".. the server has to prove to the client that it received the client's WebSocket handshake, so that the server doesn't accept connections that are not WebSocket connections."

and:

"This helps ensure that the server does not accept connections from non-WebSocket clients .."

Neither of these claims make any sense. The server is never the one rejecting the connection because it is the one computing the hash, not the one checking it.

This sort of exchange would make some sense if the magic GUID were not fixed, but were instead a shared secret between client and server. In that case the exchange would allow the server to prove to the client that it had the shared secret without revealing it.