RegEx for match/replacing JavaScript comments (both multiline and inline)

I need to remove all JavaScript comments from a JavaScript source using the JavaScript RegExp object.

What I need is the pattern for the RegExp.

So far, I've found this:

compressed = compressed.replace(/\/\*.+?\*\/|\/\/.*(?=[\n\r])/g, '');

This pattern works OK for:

/* I'm a comment */

or for:

/*
 * I'm a comment aswell
*/

But doesn't seem to work for the inline:

// I'm an inline comment

I'm not quite an expert for RegEx and it's patterns, so I need help.

Also, I' would like to have a RegEx pattern which would remove all those HTML-like comments.

<!-- HTML Comment //--> or <!-- HTML Comment -->

And also those conditional HTML comments, which can be found in various JavaScript sources.

Thanks.


Solution 1:

NOTE: Regex is not a lexer or a parser. If you have some weird edge case where you need some oddly nested comments parsed out of a string, use a parser. For the other 98% of the time this regex should work.

I had pretty complex block comments going on with nested asterisks, slashes, etc. The regular expression at the following site worked like a charm:

http://upshots.org/javascript/javascript-regexp-to-remove-comments
(see below for original)

Some modifications have been made, but the integrity of the original regex has been preserved. In order to allow certain double-slash (//) sequences (such as URLs), you must use back reference $1 in your replacement value instead of an empty string. Here it is:

/\/\*[\s\S]*?\*\/|([^\\:]|^)\/\/.*$/gm

// JavaScript: 
// source_string.replace(/\/\*[\s\S]*?\*\/|([^\\:]|^)\/\/.*$/gm, '$1');

// PHP:
// preg_replace("/\/\*[\s\S]*?\*\/|([^\\:]|^)\/\/.*$/m", "$1", $source_string);

DEMO: https://regex101.com/r/B8WkuX/1

FAILING USE CASES: There are a few edge cases where this regex fails. An ongoing list of those cases is documented in this public gist. Please update the gist if you can find other cases.

...and if you also want to remove <!-- html comments --> use this:

/\/\*[\s\S]*?\*\/|([^\\:]|^)\/\/.*|<!--[\s\S]*?-->$/

(original - for historical reference only)

// DO NOT USE THIS - SEE ABOVE
/(\/\*([\s\S]*?)\*\/)|(\/\/(.*)$)/gm

Solution 2:

try this,

(\/\*[\w\'\s\r\n\*]*\*\/)|(\/\/[\w\s\']*)|(\<![\-\-\s\w\>\/]*\>)

should work :) enter image description here

Solution 3:

I have been putting togethor an expression that needs to do something similar.
the finished product is:

/(?:((["'])(?:(?:\\\\)|\\\2|(?!\\\2)\\|(?!\2).|[\n\r])*\2)|(\/\*(?:(?!\*\/).|[\n\r])*\*\/)|(\/\/[^\n\r]*(?:[\n\r]+|$))|((?:=|:)\s*(?:\/(?:(?:(?!\\*\/).)|\\\\|\\\/|[^\\]\[(?:\\\\|\\\]|[^]])+\])+\/))|((?:\/(?:(?:(?!\\*\/).)|\\\\|\\\/|[^\\]\[(?:\\\\|\\\]|[^]])+\])+\/)[gimy]?\.(?:exec|test|match|search|replace|split)\()|(\.(?:exec|test|match|search|replace|split)\((?:\/(?:(?:(?!\\*\/).)|\\\\|\\\/|[^\\]\[(?:\\\\|\\\]|[^]])+\])+\/))|(<!--(?:(?!-->).)*-->))/g

Scary right?

To break it down, the first part matches anything within single or double quotation marks
This is necessary to avoid matching quoted strings

((["'])(?:(?:\\\\)|\\\2|(?!\\\2)\\|(?!\2).|[\n\r])*\2)

the second part matches multiline comments delimited by /* */

(\/\*(?:(?!\*\/).|[\n\r])*\*\/)

The third part matches single line comments starting anywhere in the line

(\/\/[^\n\r]*(?:[\n\r]+|$))

The fourth through sixth parts matchs anything within a regex literal
This relies on a preceding equals sign or the literal being before or after a regex call

((?:=|:)\s*(?:\/(?:(?:(?!\\*\/).)|\\\\|\\\/|[^\\]\[(?:\\\\|\\\]|[^]])+\])+\/))
((?:\/(?:(?:(?!\\*\/).)|\\\\|\\\/|[^\\]\[(?:\\\\|\\\]|[^]])+\])+\/)[gimy]?\.(?:exec|test|match|search|replace|split)\()
(\.(?:exec|test|match|search|replace|split)\((?:\/(?:(?:(?!\\*\/).)|\\\\|\\\/|[^\\]\[(?:\\\\|\\\]|[^]])+\])+\/))

and the seventh which I originally forgot removes the html comments

(<!--(?:(?!-->).)*-->)

I had an issue with my dev environment issuing errors for a regex that broke a line, so I used the following solution

var ADW_GLOBALS = new Object
ADW_GLOBALS = {
  quotations : /((["'])(?:(?:\\\\)|\\\2|(?!\\\2)\\|(?!\2).|[\n\r])*\2)/,
  multiline_comment : /(\/\*(?:(?!\*\/).|[\n\r])*\*\/)/,
  single_line_comment : /(\/\/[^\n\r]*[\n\r]+)/,
  regex_literal : /(?:\/(?:(?:(?!\\*\/).)|\\\\|\\\/|[^\\]\[(?:\\\\|\\\]|[^]])+\])+\/)/,
  html_comments : /(<!--(?:(?!-->).)*-->)/,
  regex_of_doom : ''
}
ADW_GLOBALS.regex_of_doom = new RegExp(
  '(?:' + ADW_GLOBALS.quotations.source + '|' + 
  ADW_GLOBALS.multiline_comment.source + '|' + 
  ADW_GLOBALS.single_line_comment.source + '|' + 
  '((?:=|:)\\s*' + ADW_GLOBALS.regex_literal.source + ')|(' + 
  ADW_GLOBALS.regex_literal.source + '[gimy]?\\.(?:exec|test|match|search|replace|split)\\(' + ')|(' + 
  '\\.(?:exec|test|match|search|replace|split)\\(' + ADW_GLOBALS.regex_literal.source + ')|' +
  ADW_GLOBALS.html_comments.source + ')' , 'g'
);

changed_text = code_to_test.replace(ADW_GLOBALS.regex_of_doom, function(match, $1, $2, $3, $4, $5, $6, $7, $8, offset, original){
  if (typeof $1 != 'undefined') return $1;
  if (typeof $5 != 'undefined') return $5;
  if (typeof $6 != 'undefined') return $6;
  if (typeof $7 != 'undefined') return $7;
  return '';
}

This returns anything captured by the quoted string text and anything found in a regex literal intact but returns an empty string for all the comment captures.

I know this is excessive and rather difficult to maintain but it does appear to work for me so far.

Solution 4:

This works for almost all cases:

var RE_BLOCKS = new RegExp([
  /\/(\*)[^*]*\*+(?:[^*\/][^*]*\*+)*\//.source,           // $1: multi-line comment
  /\/(\/)[^\n]*$/.source,                                 // $2 single-line comment
  /"(?:[^"\\]*|\\[\S\s])*"|'(?:[^'\\]*|\\[\S\s])*'/.source, // - string, don't care about embedded eols
  /(?:[$\w\)\]]|\+\+|--)\s*\/(?![*\/])/.source,           // - division operator
  /\/(?=[^*\/])[^[/\\]*(?:(?:\[(?:\\.|[^\]\\]*)*\]|\\.)[^[/\\]*)*?\/[gim]*/.source
  ].join('|'),                                            // - regex
  'gm'  // note: global+multiline with replace() need test
);

// remove comments, keep other blocks
function stripComments(str) {
  return str.replace(RE_BLOCKS, function (match, mlc, slc) {
    return mlc ? ' ' :         // multiline comment (replace with space)
           slc ? '' :          // single/multiline comment
           match;              // divisor, regex, or string, return as-is
  });
}

The code is based on regexes from jspreproc, I wrote this tool for the riot compiler.

See http://github.com/aMarCruz/jspreproc