HttpClient 4 - how to capture last redirect URL

I have rather simple HttpClient 4 code that calls HttpGet to get HTML output. The HTML returns with scripts and image locations all set to local (e.g. <img src="/images/foo.jpg"/>) so I need calling URL to make these into absolute (<img src="http://foo.com/images/foo.jpg"/>) Now comes the problem - during the call there may be one or two 302 redirects so the original URL is no longer reflects the location of HTML.

How do I get the latest URL of the returned content given all the redirects I may (or may not) have?

I looked at HttpGet#getAllHeaders() and HttpResponse#getAllHeaders() - couldn't find anything.

Edited: HttpGet#getURI() returns original calling address


Solution 1:

That would be the current URL, which you can get by calling

  HttpGet#getURI();

EDIT: You didn't mention how you are doing redirect. That works for us because we handle the 302 ourselves.

Sounds like you are using DefaultRedirectHandler. We used to do that. It's kind of tricky to get the current URL. You need to use your own context. Here are the relevant code snippets,

        HttpGet httpget = new HttpGet(url);
        HttpContext context = new BasicHttpContext(); 
        HttpResponse response = httpClient.execute(httpget, context); 
        if (response.getStatusLine().getStatusCode() != HttpStatus.SC_OK)
            throw new IOException(response.getStatusLine().toString());
        HttpUriRequest currentReq = (HttpUriRequest) context.getAttribute( 
                ExecutionContext.HTTP_REQUEST);
        HttpHost currentHost = (HttpHost)  context.getAttribute( 
                ExecutionContext.HTTP_TARGET_HOST);
        String currentUrl = (currentReq.getURI().isAbsolute()) ? currentReq.getURI().toString() : (currentHost.toURI() + currentReq.getURI());

The default redirect didn't work for us so we changed but I forgot what was the problem.

Solution 2:

In HttpClient 4, if you are using LaxRedirectStrategy or any subclass of DefaultRedirectStrategy, this is the recommended way (see source code of DefaultRedirectStrategy) :

HttpContext context = new BasicHttpContext();
HttpResult<T> result = client.execute(request, handler, context);
URI finalUrl = request.getURI();
RedirectLocations locations = (RedirectLocations) context.getAttribute(DefaultRedirectStrategy.REDIRECT_LOCATIONS);
if (locations != null) {
    finalUrl = locations.getAll().get(locations.getAll().size() - 1);
}

Since HttpClient 4.3.x, the above code can be simplified as:

HttpClientContext context = HttpClientContext.create();
HttpResult<T> result = client.execute(request, handler, context);
URI finalUrl = request.getURI();
List<URI> locations = context.getRedirectLocations();
if (locations != null) {
    finalUrl = locations.get(locations.size() - 1);
}

Solution 3:

    HttpGet httpGet = new HttpHead("<put your URL here>");
    HttpClient httpClient = HttpClients.createDefault();
    HttpClientContext context = HttpClientContext.create();
    httpClient.execute(httpGet, context);
    List<URI> redirectURIs = context.getRedirectLocations();
    if (redirectURIs != null && !redirectURIs.isEmpty()) {
        for (URI redirectURI : redirectURIs) {
            System.out.println("Redirect URI: " + redirectURI);
        }
        URI finalURI = redirectURIs.get(redirectURIs.size() - 1);
    }

Solution 4:

I found this on HttpComponents Client Documentation

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpClientContext context = HttpClientContext.create();
HttpGet httpget = new HttpGet("http://localhost:8080/");
CloseableHttpResponse response = httpclient.execute(httpget, context);
try {
    HttpHost target = context.getTargetHost();
    List<URI> redirectLocations = context.getRedirectLocations();
    URI location = URIUtils.resolve(httpget.getURI(), target, redirectLocations);
    System.out.println("Final HTTP location: " + location.toASCIIString());
    // Expected to be an absolute URI
} finally {
    response.close();
}

Solution 5:

An IMHO improved way based upon ZZ Coder's solution is to use a ResponseInterceptor to simply track the last redirect location. That way you don't lose information e.g. after an hashtag. Without the response interceptor you lose the hashtag. Example: http://j.mp/OxbI23

private static HttpClient createHttpClient() throws NoSuchAlgorithmException, KeyManagementException {
    SSLContext sslContext = SSLContext.getInstance("SSL");
    TrustManager[] trustAllCerts = new TrustManager[] { new TrustAllTrustManager() };
    sslContext.init(null, trustAllCerts, new java.security.SecureRandom());

    SSLSocketFactory sslSocketFactory = new SSLSocketFactory(sslContext);
    SchemeRegistry schemeRegistry = new SchemeRegistry();
    schemeRegistry.register(new Scheme("https", 443, sslSocketFactory));
    schemeRegistry.register(new Scheme("http", 80, new PlainSocketFactory()));

    HttpParams params = new BasicHttpParams();
    ClientConnectionManager cm = new org.apache.http.impl.conn.SingleClientConnManager(schemeRegistry);

    // some pages require a user agent
    AbstractHttpClient httpClient = new DefaultHttpClient(cm, params);
    HttpProtocolParams.setUserAgent(httpClient.getParams(), "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:13.0) Gecko/20100101 Firefox/13.0.1");

    httpClient.setRedirectStrategy(new RedirectStrategy());

    httpClient.addResponseInterceptor(new HttpResponseInterceptor() {
        @Override
        public void process(HttpResponse response, HttpContext context)
                throws HttpException, IOException {
            if (response.containsHeader("Location")) {
                Header[] locations = response.getHeaders("Location");
                if (locations.length > 0)
                    context.setAttribute(LAST_REDIRECT_URL, locations[0].getValue());
            }
        }
    });

    return httpClient;
}

private String getUrlAfterRedirects(HttpContext context) {
    String lastRedirectUrl = (String) context.getAttribute(LAST_REDIRECT_URL);
    if (lastRedirectUrl != null)
        return lastRedirectUrl;
    else {
        HttpUriRequest currentReq = (HttpUriRequest) context.getAttribute(ExecutionContext.HTTP_REQUEST);
        HttpHost currentHost = (HttpHost)  context.getAttribute(ExecutionContext.HTTP_TARGET_HOST);
        String currentUrl = (currentReq.getURI().isAbsolute()) ? currentReq.getURI().toString() : (currentHost.toURI() + currentReq.getURI());
        return currentUrl;
    }
}

public static final String LAST_REDIRECT_URL = "last_redirect_url";

use it just like ZZ Coder's solution:

HttpResponse response = httpClient.execute(httpGet, context);
String url = getUrlAfterRedirects(context);