find words in html page with javascript
how can i search an html page for a word fast? and how can i get the html tag that the word is in? (so i can work with the entire tag)
To find the element that word exists in, you'd have to traverse the entire tree looking in just the text nodes, applying the same test as above. Once you find the word in a text node, return the parent of that node.
var word = "foo",
queue = [document.body],
curr
;
while (curr = queue.pop()) {
if (!curr.textContent.match(word)) continue;
for (var i = 0; i < curr.childNodes.length; ++i) {
switch (curr.childNodes[i].nodeType) {
case Node.TEXT_NODE : // 3
if (curr.childNodes[i].textContent.match(word)) {
console.log("Found!");
console.log(curr);
// you might want to end your search here.
}
break;
case Node.ELEMENT_NODE : // 1
queue.push(curr.childNodes[i]);
break;
}
}
}
this works in Firefox, no promises for IE.
What it does is start with the body element and check to see if the word exists inside that element. If it doesn't, then that's it, and the search stops there. If it is in the body element, then it loops through all the immediate children of the body. If it finds a text node, then see if the word is in that text node. If it finds an element, then push that into the queue. Keep on going until you've either found the word or there's no more elements to search.
You can iterate through DOM elements, looking for a substring within them. Neither fast nor elegant, but for small HTML might work well enough.
I'd try something recursive, like: (code not tested)
findText(node, text) {
if(node.childNodes.length==0) {//leaf node
if(node.textContent.indexOf(text)== -1) return [];
return [node];
}
var matchingNodes = new Array();
for(child in node.childNodes) {
matchingNodes.concat(findText(child, text));
}
return matchingNodes;
}