How can I use HTML Agility Pack to retrieve all the images from a website?
Solution 1:
You can do this using LINQ, like this:
var document = new HtmlWeb().Load(url);
var urls = document.DocumentNode.Descendants("img")
.Select(e => e.GetAttributeValue("src", null))
.Where(s => !String.IsNullOrEmpty(s));
EDIT: This code now actually works; I had forgotten to write document.DocumentNode
.
Solution 2:
Based on their one example, but with modified XPath:
HtmlDocument doc = new HtmlDocument();
List<string> image_links = new List<string>();
doc.Load("file.htm");
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//img"))
{
image_links.Add( link.GetAttributeValue("src", "") );
}
I don't know this extension, so I'm not sure how to write out the array to somewhere else, but that will at least get you your data. (Also, I don't define the array correctly, I'm sure. Sorry).
Edit
Using your example:
public void GetAllImages()
{
WebClient x = new WebClient();
string source = x.DownloadString(@"http://www.google.com");
HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
List<string> image_links = new List<string>();
document.Load(source);
foreach(HtmlNode link in document.DocumentElement.SelectNodes("//img"))
{
image_links.Add( link.GetAttributeValue("src", "") );
}
}