Detect end of HTTP request body

I'm playing around with writing my own HTTP client and server and want to have the client include an optional body in the request. On the server side I want to read the entire body before sending the HTTP response. My question is on the server how do I know that I've read the entire body?

Even though in this case I control both the client and server, I'm looking for a "standard" approach. However, since Content-Length is optional I want a method that doesn't require it. If the client closes the connection, it is easy to read all available data, however the client needs to keep the connection open to wait for a response, so this method doesn't work.

All that I can think I'm left with is having knowledge of the format of the body and detecting a terminator (eg. </HTML>). Ideally I'm not wanting to require that knowledge.

Is there an approach I'm overlooking?


Solution 1:

Assuming you want your client to work with other servers, and server to work with other clients, your server can't expect to be treated nicely.

There are two ways to tell when the body has ended. Neither of them require knowledge of the body's content type as you suggest (e.g., don't bother looking for </html> -- that goes far outside the HTTP protocol).

  1. If the client sends a message with Transfer-Encoding: Chunked, you will need to parse the somewhat complicated chunked transfer encoding syntax. You don't really have much choice in the matter -- if the client is sending in this format, you have to receive it. When the client is using this approach, you can detect the end of the body by a chunk with a length of 0.
  2. If the client instead sends a Content-Length, you must use that.

As you suggest, the third method for detecting the end -- when the connection closes -- only works for the response, not the request (as then there is no way to send a response).

Solution 2:

If a request contains a message-body and a Content-Length is not given, 
the server SHOULD respond with 400 (bad request) if it cannot determine
the length of the message, or with 411 (length required) if it wishes 
to insist on receiving a valid Content-Length.

i.e. you are entitled to insist on either Transfer-Encoding: chunked or Content-Length, so you don't have to worry about determining the length in any other situation

Solution 3:

I add another answer mainly because I don't have enough rep to comment on mgiuca's. I know the question is kind of old but no definite answer has been made.

As it was mentioned, the main thing to take into account is that your server interacts with uncontrollable others, meaning you cannot know what they will send at all, and must be prepared to manage whatever comes through that gate. Taking this into account, sticking to standards and common practices is likely the best choice.

If the client sends a "Content-Length" header, the server must parse it and use it to determine the end of the request. If there was no such header but the "Transfer-Encoding: chunked" header was present, then the server must be able to parse a chunked request (link from mgiuca's answer). Finally, if neither are present, "the end of the connection" signals the end of the request.

What I think you overlooked is the fact that the client can end the connection and still get a response from the server. I mean, what does "to end the connection" mean? Remember that HTTP is an Application Layer Protocol that travels (usually) over TCP. Exploring TCP's functionality (particularly its connection termination protocol) reveals some interesting information:

  • To actively end a connection, the client sends a packet with the FIN flag, part of a four-way handshake. The connection is still considered open because the terminating protocol hasn't finished yet.
  • The server receives this packet and informs the client so (ACK package). The server now knows that the client will transmit no more data.
  • The client goes to a FIN_WAIT2 state, waiting for a packet with the FIN flag from the server to properly close the connection.

But there it is! The client has informed that he wants to end the connection, and the server knows so, but the connection is still opened on the client's side (he didn't close it yet because he didn't receive the FIN packet). The server now answers the request and then closes the connection properly. It's important to note that the client will ACK every server packet with an additional RST flag, telling the server that he is still expecting FIN to close the connection.

When the server is done (in our little example, after sending the HTTP Response) he closes the connection on his side, sending the FIN packet. The client closes his side when he receives it, and notifys the server with an ACK.

On an additional note, I don't know the context you are programming in, but most of the times you will end up calling shutdown() on a socket. POSIX's shutdown (and Windows'at least) take which interface of the connection you want to close as a function argument. These specs make clear that you can just close the sender part (which is exactly what the client will be doing), disabling data sending whilst allowing further data to be received by the client.

Further details on TCP Connections go beyond the range of this question, but I'd recommend reading about it to gain a better understanding of protocols of higher layers that use it.

Solution 4:

rfc

The easy way: Use HTTP 1.0 and require content length

For compatibility with HTTP/1.0 applications, HTTP/1.1 requests containing a message-body MUST include a valid Content-Length header field unless the server is known to be HTTP/1.1 compliant. If a request contains a message-body and a Content-Length is not given, the server SHOULD respond with 400 (bad request) if it cannot determine the length of the message, or with 411 (length required) if it wishes to insist on receiving a valid Content-Length.