Regular Expression to Match All Comments in a T-SQL Script

I need a Regular Expression to capture ALL comments in a block of T-SQL. The Expression will need to work with the .Net Regex Class.

Let's say I have the following T-SQL:

-- This is Comment 1
SELECT Foo FROM Bar
GO

-- This is
-- Comment 2
UPDATE Bar SET Foo == 'Foo'
GO

/* This is Comment 3 */
DELETE FROM Bar WHERE Foo = 'Foo'

/* This is a
multi-line comment */
DROP TABLE Bar

I need to capture all of the comments, including the multi-line ones, so that I can strip them out.

EDIT: It would serve the same purpose to have an expression that takes everything BUT the comments.


Solution 1:

This should work:

(--.*)|(((/\*)+?[\w\W]+?(\*/)+))

Solution 2:

In PHP, i'm using this code to uncomment SQL (this is the commented version -> x modifier) :

trim( preg_replace( '@
(([\'"]).*?[^\\\]\2) # $1 : Skip single & double quoted expressions
|(                   # $3 : Match comments
    (?:\#|--).*?$    # - Single line comment
    |                # - Multi line (nested) comments
     /\*             #   . comment open marker
        (?: [^/*]    #   . non comment-marker characters
            |/(?!\*) #   . not a comment open
            |\*(?!/) #   . not a comment close
            |(?R)    #   . recursive case
        )*           #   . repeat eventually
    \*\/             #   . comment close marker
)\s*                 # Trim after comments
|(?<=;)\s+           # Trim after semi-colon
@msx', '$1', $sql ) );

Short version:

trim( preg_replace( '@(([\'"]).*?[^\\\]\2)|((?:\#|--).*?$|/\*(?:[^/*]|/(?!\*)|\*(?!/)|(?R))*\*\/)\s*|(?<=;)\s+@ms', '$1', $sql ) );

Solution 3:

Using this code :

StringCollection resultList = new StringCollection(); 
try {
Regex regexObj = new Regex(@"/\*(?>(?:(?!\*/|/\*).)*)(?>(?:/\*(?>(?:(?!\*/|/\*).)*)\*/(?>(?:(?!\*/|/\*).)*))*).*?\*/|--.*?\r?[\n]", RegexOptions.Singleline);
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
    resultList.Add(matchResult.Value);
    matchResult = matchResult.NextMatch();
} 
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}

With the following input :

-- This is Comment 1
SELECT Foo FROM Bar
GO

-- This is
-- Comment 2
UPDATE Bar SET Foo == 'Foo'
GO

/* This is Comment 3 */
DELETE FROM Bar WHERE Foo = 'Foo'

/* This is a
multi-line comment */
DROP TABLE Bar

/* comment /* nesting */ of /* two */ levels supported */
foo...

Produces these matches :

-- This is Comment 1
-- This is
-- Comment 2
/* This is Comment 3 */
/* This is a
multi-line comment */
/* comment /* nesting */ of /* two */ levels supported */

Not that this will only match 2 levels of nested comments, although in my life I have never seen more than one level being used. Ever.