HtmlAgilityPack Drops Option End Tags

The exact same error is reported on the HAP home page's discussion, but it looks like no meaningful fixes have been made to the project in a few years. Not encouraging.

A quick browse of the source suggests the error might be fixable by commenting out line 92 of HtmlNode.cs:

// they sometimes contain, and sometimes they don 't...
ElementsFlags.Add("option", HtmlElementFlag.Empty);

(Actually no, they always contain label text, although a blank string would also be valid text. A careless author might omit the end-tag, but then that's true of any element.)

ADD

An equivalent solution is calling HtmlNode.ElementsFlags.Remove("option"); before any use of liberary (without need to modify the liberary source code)


It seems that there is some reason not to parse the Option tag as a "generic" tag, for XHTML compliance, however this can be a real pain in the neck.

My suggestion is to do a whole-string-replace and change all "option" tags to "my_option" tags, that way you:

  1. Don't have to modify the source of the library (and can upgrade it later).
  2. Can parse as you usually would.

The original post on HtmlAgilityPack forum can be found at: http://htmlagilitypack.codeplex.com/Thread/View.aspx?ThreadId=14982