HtmlAgilityPack and selecting Nodes and Subnodes
Hope somebody can help me.
Let´s say I have a html
document that contains multiple divs
like this example:
<div class="search_hit">
<span prop="name">Richard Winchester</span>
<span prop="company">Kodak</span>
<span prop="street">Arlington Road 1</span>
</div>
<div class="search_hit">
<span prop="name">Ted Mosby</span>
<span prop="company">HP</span>
<span prop="street">Arlington Road 2</span>
</div>
I´m using HtmlAgilityPack
to get the html
document. What I need to know is how can I get the spans for each search_hit-div
?
My first thought was something like this:
foreach (HtmlAgilityPack.HtmlNode node in
doc.DocumentNode.SelectNodes("//div[@class='search_hit']"))
{
foreach (HtmlAgilityPack.HtmlNode node2 in node.SelectNodes("//span[@prop]"))
{
}
}
Each div
should be an object with the included spans as properties:
public class Record
{
public string Name { get; set; }
public string company { get; set; }
public string street { get; set; }
}
And this List shall be filled then:
public List<Record> Results = new List<Record>();
But the XPATH
I'm using is not doing a search in the sub node as it should do. It seams that it searches the whole document again and again.
I mean I already got it working in that way that I just get the the spans of the whole page, but then I have no relation between the spans
and divs
. Means, I don´t know anymore which span
is related to which div
.
Does somebody know a solution? I already played around that much that I'm totally confused now. :)
Any help is appreciated!
If you use //
, it searches from the document begin.
Use .//
to search all from the current node
foreach (HtmlAgilityPack.HtmlNode node2 in node.SelectNodes(".//span[@prop]"))
Or drop the prefix entirely to search just for direct children:
foreach (HtmlAgilityPack.HtmlNode node2 in node.SelectNodes("span[@prop]"))
The following works for me. The important bit is just as BeniBela noted to add a dot in second call to 'SelectNodes'.
List<Record> lstRecords=new List<Record>();
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@class='search_hit']"))
{
Record record=new Record();
foreach (HtmlNode node2 in node.SelectNodes(".//span[@prop]"))
{
string attributeValue = node2.GetAttributeValue("prop", "");
if (attributeValue == "name")
{
record.Name = node2.InnerText;
}
else if (attributeValue == "company")
{
record.company = node2.InnerText;
}
else if (attributeValue == "street")
{
record.street = node2.InnerText;
}
}
lstRecords.Add(record);
}