using static Regex.IsMatch vs creating an instance of Regex
In C# should you have code like:
public static string importantRegex = "magic!";
public void F1(){
//code
if(Regex.IsMatch(importantRegex)){
//codez in here.
}
//more code
}
public void main(){
F1();
/*
some stuff happens......
*/
F1();
}
or should you persist an instance of a Regex containing the important pattern? What is the cost of using Regex.IsMatch? I imagine there is an NFA created in each Regex intance. From what I understand this NFA creation is non trivial.
Solution 1:
In a rare departure from my typical egotism, I'm kind of reversing myself on this answer.
My original answer, preserved below, was based on an examination of version 1.1 of the .NET framework. This is pretty shameful, since .NET 2.0 had been out for over three years at the time of my answer, and it contained changes to the Regex
class that significantly affect the difference between the static and instance methods.
In .NET 2.0 (and 4.0), the static IsMatch
function is defined as follows:
public static bool IsMatch(string input, string pattern){
return new Regex(pattern, RegexOptions.None, true).IsMatch(input);
}
The significant difference here is that little true
as the third argument. That corresponds to a parameter named "useCache". When that is true, then the parsed tree is retrieved from cached on the second and subsequent use.
This caching eats up most—but not all—of the performance difference between the static and instance methods. In my tests, the static IsMatch
method was still about 20% slower than the instance method, but that only amounted to about a half second increase when run 100 times over a set of 10,000 input strings (for a total of 1 million operations).
This 20% slowdown can still be significant in some scenarios. If you find yourself regexing hundreds of millions of strings, you'll probably want to take every step you can to make it more efficient. But I'd bet that 99% of the time, you're using a particular Regex no more than a handful of times, and the extra millisecond you lose to the static method won't be even close to noticeable.
Props to devgeezer, who pointed this out almost a year ago, although no one seemed to notice.
My old answer follows:
The static IsMatch
function is defined as follows:
public static bool IsMatch(string input, string pattern){
return new Regex(pattern).IsMatch(input);
}
And, yes, initialization of a Regex
object is not trivial. You should use the static IsMatch
(or any of the other static Regex
functions) as a quick shortcut only for patterns that you will use only once. If you will reuse the pattern, it's worth it to reuse a Regex
object, too.
As to whether or not you should specify RegexOptions.Compiled
, as suggested by Jon Skeet, that's another story. The answer there is: it depends. For simple patterns or for patterns used only a handful of times, it may well be faster to use a non-compiled instance. You should definitely profile before deciding. The cost of compiling a regular expression object is quite large indeed, and may not be worth it.
Take, as an example, the following:
const int count = 10000;
string pattern = "^[a-z]+[0-9]+$";
string input = "abc123";
Stopwatch sw = Stopwatch.StartNew();
for(int i = 0; i < count; i++)
Regex.IsMatch(input, pattern);
Console.WriteLine("static took {0} seconds.", sw.Elapsed.TotalSeconds);
sw.Reset();
sw.Start();
Regex rx = new Regex(pattern);
for(int i = 0; i < count; i++)
rx.IsMatch(input);
Console.WriteLine("instance took {0} seconds.", sw.Elapsed.TotalSeconds);
sw.Reset();
sw.Start();
rx = new Regex(pattern, RegexOptions.Compiled);
for(int i = 0; i < count; i++)
rx.IsMatch(input);
Console.WriteLine("compiled took {0} seconds.", sw.Elapsed.TotalSeconds);
At count = 10000
, as listed, the second output is fastest. Increase count
to 100000
, and the compiled version wins.
Solution 2:
If you're going to reuse the regular expression multiple times, I'd create it with RegexOptions.Compiled
and cache it. There's no point in making the framework parse the regex pattern every time you want it.
Solution 3:
This answer is no longer correct in regard to versions of .NET that I have on my machine. 4.0.30319 & 2.0.50727 both have the following for IsMatch:
public static bool IsMatch(string input, string pattern)
{
return new Regex(pattern, RegexOptions.None, true).IsMatch(input);
}
The 'true' value is for a constructor parameter called "useCache". All of the Regex constructors ultimately chain through this one, the statics call this one directly - passing in 'true'.
You read more on the BCL blog post about optimizing Regex performance highlighting the static methods' cache use here. This blog posts also cites performance measurements. Reading series of blog posts on optimizing Regex performance is a great place to start.