Best way to specify whitespace in a String.Split operation

Solution 1:

If you just call:

string[] ssize = myStr.Split(null); //Or myStr.Split()

or:

string[] ssize = myStr.Split(new char[0]);

then white-space is assumed to be the splitting character. From the string.Split(char[]) method's documentation page.

If the separator parameter is null or contains no characters, white-space characters are assumed to be the delimiters. White-space characters are defined by the Unicode standard and return true if they are passed to the Char.IsWhiteSpace method.

Always, always, always read the documentation!

Solution 2:

Yes, There is need for one more answer here!

All the solutions thus far address the rather limited domain of canonical input, to wit: a single whitespace character between elements (though tip of the hat to @cherno for at least mentioning the problem). But I submit that in all but the most obscure scenarios, splitting all of these should yield identical results:

string myStrA = "The quick brown fox jumps over the lazy dog";
string myStrB = "The  quick  brown  fox  jumps  over  the  lazy  dog";
string myStrC = "The quick brown fox      jumps over the lazy dog";
string myStrD = "   The quick brown fox jumps over the lazy dog";

String.Split (in any of the flavors shown throughout the other answers here) simply does not work well unless you attach the RemoveEmptyEntries option with either of these:

myStr.Split(new char[0], StringSplitOptions.RemoveEmptyEntries)
myStr.Split(new char[] {' ','\t'}, StringSplitOptions.RemoveEmptyEntries)

As the illustration reveals, omitting the option yields four different results (labeled A, B, C, and D) vs. the single result from all four inputs when you use RemoveEmptyEntries:

String.Split vs Regex.Split

Of course, if you don't like using options, just use the regex alternative :-)

Regex.Split(myStr, @"\s+").Where(s => s != string.Empty)

Solution 3:

According to the documentation :

If the separator parameter is null or contains no characters, white-space characters are assumed to be the delimiters. White-space characters are defined by the Unicode standard and return true if they are passed to the Char.IsWhiteSpace method.

So just call myStr.Split(); There's no need to pass in anything because separator is a params array.

Solution 4:

Why dont you use?:

string[] ssizes = myStr.Split(' ', '\t');

Solution 5:

Note that adjacent whitespace will NOT be treated as a single delimiter, even when using String.Split(null). If any of your tokens are separated with multiple spaces or tabs, you'll get empty strings returned in your array.

From the documentation:

Each element of separator defines a separate delimiter character. If two delimiters are adjacent, or a delimiter is found at the beginning or end of this instance, the corresponding array element contains Empty.