How can I match a quote-delimited string with a regex?

You should use number one, because number two is bad practice. Consider that the developer who comes after you wants to match strings that are followed by an exclamation point. Should he use:

"[^"]*"!

or:

".*?"!

The difference appears when you have the subject:

"one" "two"!

The first regex matches:

"two"!

while the second regex matches:

"one" "two"!

Always be as specific as you can. Use the negated character class when you can.

Another difference is that [^"]* can span across lines, while .* doesn't unless you use single line mode. [^"\n]* excludes the line breaks too.

As for backtracking, the second regex backtracks for each and every character in every string that it matches. If the closing quote is missing, both regexes will backtrack through the entire file. Only the order in which then backtrack is different. Thus, in theory, the first regex is faster. In practice, you won't notice the difference.


More complicated, but it handles escaped quotes and also escaped backslashes (escaped backslashes followed by a quote is not a problem)

/(["'])((\\{2})*|(.*?[^\\](\\{2})*))\1/

Examples:
  "hello\"world" matches "hello\"world"
  "hello\\"world" matches "hello\\"


I would suggest:

([\"'])(?:\\\1|.)*?\1

But only because it handles escaped quote chars and allows both the ' and " to be the quote char. I would also suggest looking at this article that goes into this problem in depth:

http://blog.stevenlevithan.com/archives/match-quoted-string

However, unless you have a serious performance issue or cannot be sure of embedded quotes, go with the simpler and more readable:

/".*?"/

I must admit that non-greedy patterns are not the basic Unix-style 'ed' regular expression, but they are getting pretty common. I still am not used to group operators like (?:stuff).


I'd say the second one is better, because it fails faster when the terminating " is missing. The first one will backtrack over the string, a potentially expensive operation. An alternative regexp if you are using perl 5.10 would be /"[^"]++"/. It conveys the same meaning as version 1 does, but is as fast as version two.