Reading a file line by line in C#
You can write a LINQ-based line reader pretty easily using an iterator block:
static IEnumerable<SomeType> ReadFrom(string file) {
string line;
using(var reader = File.OpenText(file)) {
while((line = reader.ReadLine()) != null) {
SomeType newRecord = /* parse line */
yield return newRecord;
}
}
}
or to make Jon happy:
static IEnumerable<string> ReadFrom(string file) {
string line;
using(var reader = File.OpenText(file)) {
while((line = reader.ReadLine()) != null) {
yield return line;
}
}
}
...
var typedSequence = from line in ReadFrom(path)
let record = ParseLine(line)
where record.Active // for example
select record.Key;
then you have ReadFrom(...)
as a lazily evaluated sequence without buffering, perfect for Where
etc.
Note that if you use OrderBy
or the standard GroupBy
, it will have to buffer the data in memory; ifyou need grouping and aggregation, "PushLINQ" has some fancy code to allow you to perform aggregations on the data but discard it (no buffering). Jon's explanation is here.
It's simpler to read a line and check whether or not it's null than to check for EndOfStream all the time.
However, I also have a LineReader
class in MiscUtil which makes all of this a lot simpler - basically it exposes a file (or a Func<TextReader>
as an IEnumerable<string>
which lets you do LINQ stuff over it. So you can do things like:
var query = from file in Directory.GetFiles("*.log")
from line in new LineReader(file)
where line.Length > 0
select new AddOn(line); // or whatever
The heart of LineReader
is this implementation of IEnumerable<string>.GetEnumerator
:
public IEnumerator<string> GetEnumerator()
{
using (TextReader reader = dataSource())
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
Almost all the rest of the source is just giving flexible ways of setting up dataSource
(which is a Func<TextReader>
).
Since .NET 4.0, the File.ReadLines()
method is available.
int count = File.ReadLines(filepath).Count(line => line.StartsWith(">"));
NOTE: You need to watch out for the IEnumerable<T>
solution, as it will result in the file being open for the duration of processing.
For example, with Marc Gravell's response:
foreach(var record in ReadFrom("myfile.csv")) {
DoLongProcessOn(record);
}
the file will remain open for the whole of the processing.