Java - How to find the redirected url of a url?
I am accessing web pages through java as follows:
URLConnection con = url.openConnection();
But in some cases, a url redirects to another url. So I want to know the url to which the previous url redirected.
Below are the header fields that I got as a response:
null-->[HTTP/1.1 200 OK]
Cache-control-->[public,max-age=3600]
last-modified-->[Sat, 17 Apr 2010 13:45:35 GMT]
Transfer-Encoding-->[chunked]
Date-->[Sat, 17 Apr 2010 13:45:35 GMT]
Vary-->[Accept-Encoding]
Expires-->[Sat, 17 Apr 2010 14:45:35 GMT]
Set-Cookie-->[cl_def_hp=copenhagen; domain=.craigslist.org; path=/; expires=Sun, 17 Apr 2011 13:45:35 GMT, cl_def_lang=en; domain=.craigslist.org; path=/; expires=Sun, 17 Apr 2011 13:45:35 GMT]
Connection-->[close]
Content-Type-->[text/html; charset=iso-8859-1;]
Server-->[Apache]
So at present, I am constructing the redirected url from the value of the Set-Cookie
header field. In the above case, the redirected url is copenhagen.craigslist.org
Is there any standard way through which I can determine which url the particular url is going to redirect.
I know that when a url redirects to other url, the server sends an intermediate response containing a Location
header field that tells the redirected url but I am not receiving that intermediate response through the url.openConnection();
method.
Solution 1:
Simply call getUrl() on URLConnection instance after calling getInputStream():
URLConnection con = new URL( url ).openConnection();
System.out.println( "orignal url: " + con.getURL() );
con.connect();
System.out.println( "connected url: " + con.getURL() );
InputStream is = con.getInputStream();
System.out.println( "redirected url: " + con.getURL() );
is.close();
If you need to know whether the redirection happened before actually getting it's contents, here is the sample code:
HttpURLConnection con = (HttpURLConnection)(new URL( url ).openConnection());
con.setInstanceFollowRedirects( false );
con.connect();
int responseCode = con.getResponseCode();
System.out.println( responseCode );
String location = con.getHeaderField( "Location" );
System.out.println( location );
Solution 2:
You need to cast the URLConnection
to HttpURLConnection
and instruct it to not follow the redirects by setting HttpURLConnection#setInstanceFollowRedirects()
to false
. You can also set it globally by HttpURLConnection#setFollowRedirects()
.
You only need to handle redirects yourself then. Check the response code by HttpURLConnection#getResponseCode()
, grab the Location
header by URLConnection#getHeaderField()
and then fire a new HTTP request on it.
Solution 3:
public static URL getFinalURL(URL url) {
try {
HttpURLConnection con = (HttpURLConnection) url.openConnection();
con.setInstanceFollowRedirects(false);
con.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36");
con.addRequestProperty("Accept-Language", "en-US,en;q=0.8");
con.addRequestProperty("Referer", "https://www.google.com/");
con.connect();
//con.getInputStream();
int resCode = con.getResponseCode();
if (resCode == HttpURLConnection.HTTP_SEE_OTHER
|| resCode == HttpURLConnection.HTTP_MOVED_PERM
|| resCode == HttpURLConnection.HTTP_MOVED_TEMP) {
String Location = con.getHeaderField("Location");
if (Location.startsWith("/")) {
Location = url.getProtocol() + "://" + url.getHost() + Location;
}
return getFinalURL(new URL(Location));
}
} catch (Exception e) {
System.out.println(e.getMessage());
}
return url;
}
To get "User-Agent" and "Referer" by yourself, just go to developer mode of one of your installed browser (E.g. press F12 on Google Chrome). Then go to tab 'Network' and then click on one of the requests. You should see it's details. Just press 'Headers' sub tab (the image below)
Solution 4:
Have a look at the HttpURLConnection
class API documentation, especially setInstanceFollowRedirects()
.