Best way to split string into lines
How do you split multi-line string into lines?
I know this way
var result = input.Split("\n\r".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
looks a bit ugly and loses empty lines. Is there a better solution?
If it looks ugly, just remove the unnecessary
ToCharArray
call.-
If you want to split by either
\n
or\r
, you've got two options:-
Use an array literal – but this will give you empty lines for Windows-style line endings
\r\n
:var result = text.Split(new [] { '\r', '\n' });
-
Use a regular expression, as indicated by Bart:
var result = Regex.Split(text, "\r\n|\r|\n");
-
If you want to preserve empty lines, why do you explicitly tell C# to throw them away? (
StringSplitOptions
parameter) – useStringSplitOptions.None
instead.
using (StringReader sr = new StringReader(text)) {
string line;
while ((line = sr.ReadLine()) != null) {
// do something
}
}
Update: See here for an alternative/async solution.
This works great and is faster than Regex:
input.Split(new[] {"\r\n", "\r", "\n"}, StringSplitOptions.None)
It is important to have "\r\n"
first in the array so that it's taken as one line break. The above gives the same results as either of these Regex solutions:
Regex.Split(input, "\r\n|\r|\n")
Regex.Split(input, "\r?\n|\r")
Except that Regex turns out to be about 10 times slower. Here's my test:
Action<Action> measure = (Action func) => {
var start = DateTime.Now;
for (int i = 0; i < 100000; i++) {
func();
}
var duration = DateTime.Now - start;
Console.WriteLine(duration);
};
var input = "";
for (int i = 0; i < 100; i++)
{
input += "1 \r2\r\n3\n4\n\r5 \r\n\r\n 6\r7\r 8\r\n";
}
measure(() =>
input.Split(new[] {"\r\n", "\r", "\n"}, StringSplitOptions.None)
);
measure(() =>
Regex.Split(input, "\r\n|\r|\n")
);
measure(() =>
Regex.Split(input, "\r?\n|\r")
);
Output:
00:00:03.8527616
00:00:31.8017726
00:00:32.5557128
and here's the Extension Method:
public static class StringExtensionMethods
{
public static IEnumerable<string> GetLines(this string str, bool removeEmptyLines = false)
{
return str.Split(new[] { "\r\n", "\r", "\n" },
removeEmptyLines ? StringSplitOptions.RemoveEmptyEntries : StringSplitOptions.None);
}
}
Usage:
input.GetLines() // keeps empty lines
input.GetLines(true) // removes empty lines
You could use Regex.Split:
string[] tokens = Regex.Split(input, @"\r?\n|\r");
Edit: added |\r
to account for (older) Mac line terminators.