Regular Expression to split on spaces unless in quotes
I would like to use the .Net Regex.Split method to split this input string into an array. It must split on whitespace unless it is enclosed in a quote.
Input: Here is "my string" it has "six matches"
Expected output:
- Here
- is
- my string
- it
- has
- six matches
What pattern do I need? Also do I need to specify any RegexOptions?
No options required
Regex:
\w+|"[\w\s]*"
C#:
Regex regex = new Regex(@"\w+|""[\w\s]*""");
Or if you need to exclude " characters:
Regex
.Matches(input, @"(?<match>\w+)|\""(?<match>[\w\s]*)""")
.Cast<Match>()
.Select(m => m.Groups["match"].Value)
.ToList()
.ForEach(s => Console.WriteLine(s));
Lieven's solution gets most of the way there, and as he states in his comments it's just a matter of changing the ending to Bartek's solution. The end result is the following working regEx:
(?<=")\w[\w\s]*(?=")|\w+|"[\w\s]*"
Input: Here is "my string" it has "six matches"
Output:
- Here
- is
- "my string"
- it
- has
- "six matches"
Unfortunately it's including the quotes. If you instead use the following:
(("((?<token>.*?)(?<!\\)")|(?<token>[\w]+))(\s)*)
And explicitly capture the "token" matches as follows:
RegexOptions options = RegexOptions.None;
Regex regex = new Regex( @"((""((?<token>.*?)(?<!\\)"")|(?<token>[\w]+))(\s)*)", options );
string input = @" Here is ""my string"" it has "" six matches"" ";
var result = (from Match m in regex.Matches( input )
where m.Groups[ "token" ].Success
select m.Groups[ "token" ].Value).ToList();
for ( int i = 0; i < result.Count(); i++ )
{
Debug.WriteLine( string.Format( "Token[{0}]: '{1}'", i, result[ i ] ) );
}
Debug output:
Token[0]: 'Here'
Token[1]: 'is'
Token[2]: 'my string'
Token[3]: 'it'
Token[4]: 'has'
Token[5]: ' six matches'