Javascript regex match fails on actual page, but regex tests work just fine

I have a very specific problem concerning a regular expression matching in Javascript. I'm trying to match a piece of source code, more specifically a portion here:

<TD WIDTH=100% ALIGN=right><a href="http://forum.tibia.com/forum/?action=main&amp;sectionid=2">World Boards</a> | <a href="http://forum.tibia.com/forum/?action=board&amp;boardid=106121">Olympa - Trade</a> | <b>Bump when Yasir...</b></TD>

The part I'm trying to match is boardid=106121">Olympa - Trade</a>, the part I actually need is "Olympa". So I use the following line of JS code to get a match and have "Olympa" returned:

var world = document.documentElement.innerHTML.match('/boardid=[0-9]+">([A-Z][a-z]+)( - Trade){0,1}<\/a>/i')[1];

the ( - Trade) part is optional in my problem, hence the {0,1} in the regex.

There's also no easier way to narrow down the code by e.g. getElementsByTagName, so searching the complete source code is my only option.

Now here's the funny thing. I have used two online regex matchers (of which one was for JS-regex specifically) to test my regex against the complete source code. Both times, it had a match and returned "Olympa" exactly as it should have. However, when I have Chrome include the script on the actual page, it gives the following error:

Error in event handler for 'undefined': Cannot read property '1' of null TypeError: Cannot read property '1' of null

Obviously, the first part of my line returns "null" because it does not find a match, and taking [1] of "null" doesn't work.

I figured I might not be doing the match on the source code, but when I let the script output document.documentElement.innerHTML to the console, it outputs the complete source code.

I see no reason why this regex fails, so I must be overlooking something very silly. Does anyone else see the problem?

All help appreciated, Kenneth


Solution 1:

You're putting your regular expression inside a string. It should not be inside a string.

var world = document.documentElement.innerHTML.match(/boardid=[0-9]+">([A-Z][a-z]+)( - Trade){0,1}<\/a>/i)[1];

Another thing — it appears you have a document object, in which case all this HTML is already parsed for you, and you can take advantage of that instead of reinventing a fragile wheel.

var element = document.querySelector('a[href*="boardid="]');
var world = element.textContent;

(This assumes that you don't need <=IE8 support. If you do, there remains a better way, though.)

(P.S. ? is shorthand for {0,1}.)