Highlight word in HTML text (but not markup)

I'm trying to highlight all matching word inside the body but not words inside any html tag. For example the keyword given is 'para'. Here's the paragraph:

<p class="para"> Example of paragraph. Lorem ipsum dolor sit amet. </p>

resulting in:

<p class="para">
Example of <strong>para</strong>graph. Lorem ipsum dolor sit amet.
</p>

I know that this is possible with JavaScript's replace() but I just don't know much about regex.


Solution 1:

Demo: http://jsfiddle.net/crgTU/7/

highlightWord(document.body,'para');

function highlightWord(root,word){
  textNodesUnder(root).forEach(highlightWords);

  function textNodesUnder(root){
    var n,a=[],w=document.createTreeWalker(root,NodeFilter.SHOW_TEXT,null,false);
    while(n=w.nextNode()) a.push(n);
    return a;
  }

  function highlightWords(n){
    for (var i; (i=n.nodeValue.indexOf(word,i)) > -1; n=after){
      var after = n.splitText(i+word.length);
      var highlighted = n.splitText(i);
      var span = document.createElement('span');
      span.className = 'highlighted';
      span.appendChild(highlighted);
      after.parentNode.insertBefore(span,after);
    }
  }
}
​

You might also consider calling something like…

function removeHighlights(root){     
  [].forEach.call(root.querySelectorAll('span.highlighted'),function(el){
    el.parentNode.replaceChild(el.firstChild,el);
  });
}

…before you go finding the new highlights (to remove old highlights from the DOM).

Solution 2:

Why using a selfmade highlighting function is a bad idea

The reason why it's probably a bad idea to start building your own highlighting function from scratch is because you will certainly run into issues that others have already solved. Challenges:

  • You would need to remove text nodes with HTML elements to highlight your matches without destroying DOM events and triggering DOM regeneration over and over again (which would be the case with e.g. innerHTML)
  • If you want to remove highlighted elements you would have to remove HTML elements with their content and also have to combine the splitted text-nodes for further searches. This is necessary because every highlighter plugin searches inside text nodes for matches and if your keywords will be splitted into several text nodes they will not being found.
  • You would also need to build tests to make sure your plugin works in situations which you have not thought about. And I'm talking about cross-browser tests!

Sounds complicated? If you want some features like ignoring some elements from highlighting, diacritics mapping, synonyms mapping, search inside iframes, separated word search, etc. this becomes more and more complicated.

Use an existing plugin

When using an existing, well implemented plugin, you don't have to worry about above named things. The article 10 jQuery text highlighter plugins on Sitepoint compares popular highlighter plugins.

Have a look at mark.js

mark.js is such a plugin that is written in pure JavaScript, but is also available as jQuery plugin. It was developed to offer more opportunities than the other plugins with options to:

  • search for keywords separately instead of the complete term
  • map diacritics (For example if "justo" should also match "justò")
  • ignore matches inside custom elements
  • use custom highlighting element
  • use custom highlighting class
  • map custom synonyms
  • search also inside iframes
  • receive not found terms

DEMO

Alternatively you can see this fiddle.

Usage example:

// Highlight "keyword" in the specified context
$(".context").mark("keyword");

// Highlight the custom regular expression in the specified context
$(".context").markRegExp(/Lorem/gmi);

It's free and developed open-source on GitHub (project reference).