Can you explain the HttpURLConnection connection process?

I am using HTTPURLConnection to connect to a web service. I know how to use HTTPURLConnection but I want to understand how it works. Basically, I want to know the following:

  • On which point does HTTPURLConnection try to establish a connection to the given URL?
  • On which point can I know that I was able to successfully establish a connection?
  • Are establishing a connection and sending the actual request done in one step/method call? What method is it?
  • Can you explain the function of getOutputStream and getInputStream in layman's term? I notice that when the server I'm trying to connect to is down, I get an Exception at getOutputStream. Does it mean that HTTPURLConnection will only start to establish a connection when I invoke getOutputStream? How about the getInputStream? Since I'm only able to get the response at getInputStream, then does it mean that I didn't send any request at getOutputStream yet but simply establishes a connection? Do HttpURLConnection go back to the server to request for response when I invoke getInputStream?
  • Am I correct to say that openConnection simply creates a new connection object but does not establish any connection yet?
  • How can I measure the read overhead and connect overhead?

Solution 1:

String message = URLEncoder.encode("my message", "UTF-8");

try {
    // instantiate the URL object with the target URL of the resource to
    // request
    URL url = new URL("http://www.example.com/comment");

    // instantiate the HttpURLConnection with the URL object - A new
    // connection is opened every time by calling the openConnection
    // method of the protocol handler for this URL.
    // 1. This is the point where the connection is opened.
    HttpURLConnection connection = (HttpURLConnection) url
            .openConnection();
    // set connection output to true
    connection.setDoOutput(true);
    // instead of a GET, we're going to send using method="POST"
    connection.setRequestMethod("POST");

    // instantiate OutputStreamWriter using the output stream, returned
    // from getOutputStream, that writes to this connection.
    // 2. This is the point where you'll know if the connection was
    // successfully established. If an I/O error occurs while creating
    // the output stream, you'll see an IOException.
    OutputStreamWriter writer = new OutputStreamWriter(
            connection.getOutputStream());

    // write data to the connection. This is data that you are sending
    // to the server
    // 3. No. Sending the data is conducted here. We established the
    // connection with getOutputStream
    writer.write("message=" + message);

    // Closes this output stream and releases any system resources
    // associated with this stream. At this point, we've sent all the
    // data. Only the outputStream is closed at this point, not the
    // actual connection
    writer.close();
    // if there is a response code AND that response code is 200 OK, do
    // stuff in the first if block
    if (connection.getResponseCode() == HttpURLConnection.HTTP_OK) {
        // OK

        // otherwise, if any other status code is returned, or no status
        // code is returned, do stuff in the else block
    } else {
        // Server returned HTTP error code.
    }
} catch (MalformedURLException e) {
    // ...
} catch (IOException e) {
    // ...
}

The first 3 answers to your questions are listed as inline comments, beside each method, in the example HTTP POST above.

From getOutputStream:

Returns an output stream that writes to this connection.

Basically, I think you have a good understanding of how this works, so let me just reiterate in layman's terms. getOutputStream basically opens a connection stream, with the intention of writing data to the server. In the above code example "message" could be a comment that we're sending to the server that represents a comment left on a post. When you see getOutputStream, you're opening the connection stream for writing, but you don't actually write any data until you call writer.write("message=" + message);.

From getInputStream():

Returns an input stream that reads from this open connection. A SocketTimeoutException can be thrown when reading from the returned input stream if the read timeout expires before data is available for read.

getInputStream does the opposite. Like getOutputStream, it also opens a connection stream, but the intent is to read data from the server, not write to it. If the connection or stream-opening fails, you'll see a SocketTimeoutException.

How about the getInputStream? Since I'm only able to get the response at getInputStream, then does it mean that I didn't send any request at getOutputStream yet but simply establishes a connection?

Keep in mind that sending a request and sending data are two different operations. When you invoke getOutputStream or getInputStream url.openConnection(), you send a request to the server to establish a connection. There is a handshake that occurs where the server sends back an acknowledgement to you that the connection is established. It is then at that point in time that you're prepared to send or receive data. Thus, you do not need to call getOutputStream to establish a connection open a stream, unless your purpose for making the request is to send data.

In layman's terms, making a getInputStream request is the equivalent of making a phone call to your friend's house to say "Hey, is it okay if I come over and borrow that pair of vice grips?" and your friend establishes the handshake by saying, "Sure! Come and get it". Then, at that point, the connection is made, you walk to your friend's house, knock on the door, request the vice grips, and walk back to your house.

Using a similar example for getOutputStream would involve calling your friend and saying "Hey, I have that money I owe you, can I send it to you"? Your friend, needing money and sick inside that you kept it for so long, says "Sure, come on over you cheap bastard". So you walk to your friend's house and "POST" the money to him. He then kicks you out and you walk back to your house.

Now, continuing with the layman's example, let's look at some Exceptions. If you called your friend and he wasn't home, that could be a 500 error. If you called and got a disconnected number message because your friend is tired of you borrowing money all the time, that's a 404 page not found. If your phone is dead because you didn't pay the bill, that could be an IOException. (NOTE: This section may not be 100% correct. It's intended to give you a general idea of what's happening in layman's terms.)

Question #5:

Yes, you are correct that openConnection simply creates a new connection object but does not establish it. The connection is established when you call either getInputStream or getOutputStream.

openConnection creates a new connection object. From the URL.openConnection javadocs:

A new connection is opened every time by calling the openConnection method of the protocol handler for this URL.

The connection is established when you call openConnection, and the InputStream, OutputStream, or both, are called when you instantiate them.

Question #6:

To measure the overhead, I generally wrap some very simple timing code around the entire connection block, like so:

long start = System.currentTimeMillis();
log.info("Time so far = " + new Long(System.currentTimeMillis() - start) );

// run the above example code here
log.info("Total time to send/receive data = " + new Long(System.currentTimeMillis() - start) );

I'm sure there are more advanced methods for measuring the request time and overhead, but this generally is sufficient for my needs.

For information on closing connections, which you didn't ask about, see In Java when does a URL connection close?.

Solution 2:

Tim Bray presented a concise step-by-step, stating that openConnection() does not establish an actual connection. Rather, an actual HTTP connection is not established until you call methods such as getInputStream() or getOutputStream().

http://www.tbray.org/ongoing/When/201x/2012/01/17/HttpURLConnection

Solution 3:

On which point does HTTPURLConnection try to establish a connection to the given URL?

On the port named in the URL if any, otherwise 80 for HTTP and 443 for HTTPS. I believe this is documented.

On which point can I know that I was able to successfully establish a connection?

When you call getInputStream() or getOutputStream() or getResponseCode() without getting an exception.

Are establishing a connection and sending the actual request done in one step/method call? What method is it?

No and none.

Can you explain the function of getOutputStream() and getInputStream() in layman's term?

Either of them first connects if necessary, then returns the required stream.

I notice that when the server I'm trying to connect to is down, I get an Exception at getOutputStream(). Does it mean that HTTPURLConnection will only start to establish a connection when I invoke getOutputStream()? How about the getInputStream()? Since I'm only able to get the response at getInputStream(), then does it mean that I didn't send any request at getOutputStream() yet but simply establishes a connection? Do HttpURLConnection go back to the server to request for response when I invoke getInputStream()?

See above.

Am I correct to say that openConnection() simply creates a new connection object but does not establish any connection yet?

Yes.

How can I measure the read overhead and connect overhead?

Connect: take the time getInputStream() or getOutputStream() takes to return, whichever you call first. Read: time from starting first read to getting the EOS.

Solution 4:

On which point does HTTPURLConnection try to establish a connection to the given URL?

It's worth clarifying, there's the 'UrlConnection' instance and then there's the underlying Tcp/Ip/SSL socket connection, 2 different concepts. The 'UrlConnection' or 'HttpUrlConnection' instance is synonymous with a single HTTP page request, and is created when you call url.openConnection(). But if you do multiple url.openConnection()'s from the one 'url' instance then if you're lucky, they'll reuse the same Tcp/Ip socket and SSL handshaking stuff...which is good if you're doing lots of page requests to the same server, especially good if you're using SSL where the overhead of establishing the socket is very high.

See: HttpURLConnection implementation