Remove text in-between delimiters in a string (using a regex?)
Simple regex would be:
string input = "Give [Me Some] Purple (And More) Elephants";
string regex = "(\\[.*\\])|(\".*\")|('.*')|(\\(.*\\))";
string output = Regex.Replace(input, regex, "");
As for doing it a custom way where you want to build up the regex you would just need to build up the parts:
('.*') // example of the single quote check
Then have each individual regex part concatenated with an OR (the | in regex) as in my original example. Once you have your regex string built just run it once. The key is to get the regex into a single check because performing a many regex matches on one item and then iterating through a lot of items will probably see a significant decrease in performance.
In my first example that would take the place of the following line:
string input = "Give [Me Some] Purple (And More) Elephants";
string regex = "Your built up regex here";
string sOutput = Regex.Replace(input, regex, "");
I am sure someone will post a cool linq expression to build the regex based on an array of delimiter objects to match or something.
A simple way would be to do this:
string RemoveBetween(string s, char begin, char end)
{
Regex regex = new Regex(string.Format("\\{0}.*?\\{1}", begin, end));
return regex.Replace(s, string.Empty);
}
string s = "Give [Me Some] Purple (And More) \\Elephants/ and .hats^";
s = RemoveBetween(s, '(', ')');
s = RemoveBetween(s, '[', ']');
s = RemoveBetween(s, '\\', '/');
s = RemoveBetween(s, '.', '^');
Changing the return statement to the following will avoid duplicate empty spaces:
return new Regex(" +").Replace(regex.Replace(s, string.Empty), " ");
The final result for this would be:
"Give Purple and "
Disclamer: A single regex would probably faster than this.
I have to add the old adage, "You have a problem and you want to use regular expressions. Now you have two problems."
I've come up with a quick regex that will hopefully help you in the direction you are looking:
[.]*(\(|\[|\"|').*(\]|\)|\"|')[.]*
The parenthesis, brackets, double quotes are escaped while the single quote is able to be left alone.
To put the above expression into English, I'm allowing for any number of characters before and any number after, matching the expression in between matching delimiters.
The open delimiter phrase is (\(|\[|\"|')
This has a matching closing phrase. To make this a bit more extensible in the future, you could remove the actual delimiters and contain them in a config file, database or wherever you may choose.