Get text between 2 html tags c#

Solution 1:

You could be incredibly specific about it:

var regex = new Regex(@"<span id=""point_total"" class=""tooltip"" oldtitle="".*?"" aria-describedby=""ui-tooltip-0"">(.*?)</span>");

var match = regex.Match(@"<span id=""point_total"" class=""tooltip"" oldtitle=""Note: If the number is black, your points are actually a little bit negative.  Don't worry, this just means you need to start subbing again."" aria-describedby=""ui-tooltip-0"">31</span>");

var result = match.Groups[1].Value;

Solution 2:

You'll want to use HtmlAgilityPack to do this, it's pretty simple:

HtmlDocument doc = new HtmlDocument();
doc.Load("filepath");

HtmlNode node = doc.DocumentNode.SelectSingleNode("//span"); //Here, you can also do something like (".//span[@id='point_total' class='tooltip' jQuery16207621750175125325='23' oldtitle='Note: If the number is black, your points are actually a little bit negative.  Don't worry, this just means you need to start subbing again.']"); to select specific spans, etc...

string value = node.InnerText; //this string will contain the value of span, i.e. <span>***value***</span>

Regex, while a viable option, is something you generally would want to avoid if at all possible for parsing html (see Here)

In terms of sustainability, you'll want to make sure that you understand the page source (i.e., refresh it a few times and see if your target span is nested within the same parents after every refresh, make sure the page is in the same general format, etc..., then navigate to the span using the above principle).

Solution 3:

There are multiple possibilities.

  1. Regex
  2. Let HTML be parsed as XML and get the value via XPath
  3. Iterate through all elements. If you get on a span tag, skip all characters until you find the closing '>'. Then the value you need is everything before the next opening '<'

Also look at System.Windows.Forms.HtmlDocument