Best way to specify whitespace in a String.Split operation
Solution 1:
If you just call:
string[] ssize = myStr.Split(null); //Or myStr.Split()
or:
string[] ssize = myStr.Split(new char[0]);
then white-space is assumed to be the splitting character. From the string.Split(char[])
method's documentation page.
If the separator parameter is
null
or contains no characters, white-space characters are assumed to be the delimiters. White-space characters are defined by the Unicode standard and returntrue
if they are passed to theChar.IsWhiteSpace
method.
Always, always, always read the documentation!
Solution 2:
Yes, There is need for one more answer here!
All the solutions thus far address the rather limited domain of canonical input, to wit: a single whitespace character between elements (though tip of the hat to @cherno for at least mentioning the problem). But I submit that in all but the most obscure scenarios, splitting all of these should yield identical results:
string myStrA = "The quick brown fox jumps over the lazy dog";
string myStrB = "The quick brown fox jumps over the lazy dog";
string myStrC = "The quick brown fox jumps over the lazy dog";
string myStrD = " The quick brown fox jumps over the lazy dog";
String.Split
(in any of the flavors shown throughout the other answers here) simply does not work well unless you attach the RemoveEmptyEntries
option with either of these:
myStr.Split(new char[0], StringSplitOptions.RemoveEmptyEntries)
myStr.Split(new char[] {' ','\t'}, StringSplitOptions.RemoveEmptyEntries)
As the illustration reveals, omitting the option yields four different results (labeled A, B, C, and D) vs. the single result from all four inputs when you use RemoveEmptyEntries
:
Of course, if you don't like using options, just use the regex alternative :-)
Regex.Split(myStr, @"\s+").Where(s => s != string.Empty)
Solution 3:
According to the documentation :
If the separator parameter is null or contains no characters, white-space characters are assumed to be the delimiters. White-space characters are defined by the Unicode standard and return true if they are passed to the Char.IsWhiteSpace method.
So just call myStr.Split();
There's no need to pass in anything because separator is a params
array.
Solution 4:
Why dont you use?:
string[] ssizes = myStr.Split(' ', '\t');
Solution 5:
Note that adjacent whitespace will NOT be treated as a single delimiter, even when using String.Split(null)
. If any of your tokens are separated with multiple spaces or tabs, you'll get empty strings returned in your array.
From the documentation:
Each element of separator defines a separate delimiter character. If two delimiters are adjacent, or a delimiter is found at the beginning or end of this instance, the corresponding array element contains Empty.