How to read a large (1 GB) txt file in .NET?

I have a 1 GB text file which I need to read line by line. What is the best and fastest way to do this?

private void ReadTxtFile()
{            
    string filePath = string.Empty;
    filePath = openFileDialog1.FileName;
    if (string.IsNullOrEmpty(filePath))
    {
        using (StreamReader sr = new StreamReader(filePath))
        {
            String line;
            while ((line = sr.ReadLine()) != null)
            {
                FormatData(line);                        
            }
        }
    }
}

In FormatData() I check the starting word of line which must be matched with a word and based on that increment an integer variable.

void FormatData(string line)
{
    if (line.StartWith(word))
    {
        globalIntVariable++;
    }
}

If you are using .NET 4.0, try MemoryMappedFile which is a designed class for this scenario.

You can use StreamReader.ReadLine otherwise.


Using StreamReader is probably the way to since you don't want the whole file in memory at once. MemoryMappedFile is more for random access than sequential reading (it's ten times as fast for sequential reading and memory mapping is ten times as fast for random access).

You might also try creating your streamreader from a filestream with FileOptions set to SequentialScan (see FileOptions Enumeration), but I doubt it will make much of a difference.

There are however ways to make your example more effective, since you do your formatting in the same loop as reading. You're wasting clockcycles, so if you want even more performance, it would be better with a multithreaded asynchronous solution where one thread reads data and another formats it as it becomes available. Checkout BlockingColletion that might fit your needs:

Blocking Collection and the Producer-Consumer Problem

If you want the fastest possible performance, in my experience the only way is to read in as large a chunk of binary data sequentially and deserialize it into text in parallel, but the code starts to get complicated at that point.


You can use LINQ:

int result = File.ReadLines(filePath).Count(line => line.StartsWith(word));

File.ReadLines returns an IEnumerable<String> that lazily reads each line from the file without loading the whole file into memory.

Enumerable.Count counts the lines that start with the word.

If you are calling this from an UI thread, use a BackgroundWorker.