Getting the Redirected URL from the Original URL
This function will return the final destination of a link — even if there are multiple redirects. It doesn't account for JavaScript-based redirects or META redirects. Notice that the previous solution didn't deal with Absolute & Relative URLs, since the LOCATION header could return something like "/newhome" you need to combine with the URL that served that response to identify the full URL destination.
public static string GetFinalRedirect(string url)
{
if(string.IsNullOrWhiteSpace(url))
return url;
int maxRedirCount = 8; // prevent infinite loops
string newUrl = url;
do
{
HttpWebRequest req = null;
HttpWebResponse resp = null;
try
{
req = (HttpWebRequest) HttpWebRequest.Create(url);
req.Method = "HEAD";
req.AllowAutoRedirect = false;
resp = (HttpWebResponse)req.GetResponse();
switch (resp.StatusCode)
{
case HttpStatusCode.OK:
return newUrl;
case HttpStatusCode.Redirect:
case HttpStatusCode.MovedPermanently:
case HttpStatusCode.RedirectKeepVerb:
case HttpStatusCode.RedirectMethod:
newUrl = resp.Headers["Location"];
if (newUrl == null)
return url;
if (newUrl.IndexOf("://", System.StringComparison.Ordinal) == -1)
{
// Doesn't have a URL Schema, meaning it's a relative or absolute URL
Uri u = new Uri(new Uri(url), newUrl);
newUrl = u.ToString();
}
break;
default:
return newUrl;
}
url = newUrl;
}
catch (WebException)
{
// Return the last known good URL
return newUrl;
}
catch (Exception ex)
{
return null;
}
finally
{
if (resp != null)
resp.Close();
}
} while (maxRedirCount-- > 0);
return newUrl;
}
The URL you mentioned uses a JavaScript redirect, which will only redirect a browser. So there's no easy way to detect the redirect.
For proper (HTTP Status Code and Location:) redirects, you might want to remove
req.AllowAutoRedirect = false;
and get the final URL using
myResp.ResponseUri
as there can be more than one redirect.
UPDATE: More clarification regarding redirects:
There's more than one way to redirect a browser to another URL.
The first way is to use a 3xx HTTP status code, and the Location: header. This is the way the gods intended HTTP redirects to work, and is also known as "the one true way." This method will work on all browsers and crawlers.
And then there are the devil's ways. These include meta refresh, the Refresh: header, and JavaScript. Although these methods work in most browsers, they are definitely not guaranteed to work, and occasionally result in strange behavior (aka. breaking the back button).
Most web crawlers, including the Googlebot, ignore these redirection methods, and so should you. If you absolutely have to detect all redirects, then you would have to parse the HTML for META tags, look for Refresh: headers in the response, and evaluate Javascript. Good luck with the last one.
Use this code to get redirecting URL
public void GrtUrl(string url)
{
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
webRequest.AllowAutoRedirect = false; // IMPORTANT
webRequest.Timeout = 10000; // timeout 10s
webRequest.Method = "HEAD";
// Get the response ...
HttpWebResponse webResponse;
using (webResponse = (HttpWebResponse)webRequest.GetResponse())
{
// Now look to see if it's a redirect
if ((int)webResponse.StatusCode >= 300 &&
(int)webResponse.StatusCode <= 399)
{
string uriString = webResponse.Headers["Location"];
Console.WriteLine("Redirect to " + uriString ?? "NULL");
webResponse.Close(); // don't forget to close it - or bad things happen
}
}
}