What's preferred pattern for reading lines from a file in C++?

I've seen at least two ways of reading lines from a file in C++ tutorials:

std::ifstream fs("myfile.txt");
if (fs.is_open()) {
  while (fs.good()) {
    std::string line;
    std::getline(fs, line);
    // ...

and:

std::ifstream fs("myfile.txt");
std::string line;
while (std::getline(fs, line)) {
  // ...

Of course, I can add a few checks to make sure that the file exists and is opened. Other than the exception handling, is there a reason to prefer the more-verbose first pattern? What's your standard practice?


Solution 1:

while (std::getline(fs, line))
{}

This is not only correct but preferable also because it is idiomatic.

I assume in the first case, you're not checking fs after std::getline() as if(!fs) break; or something equivalent. Because if you don't do so, then the first case is completely wrong. Or if you do that, then second one is still preferable as its more concise and clear in logic.

The function good() should be used after you made an attempt to read from the stream; its used to check if the attempt was successful. In your first case, you don't do so. After std::getline(), you assume that the read was successful, without even checking what fs.good() returns. Also, you seem to assume that if fs.good() returns true, std::getline would successfully read a line from the stream. You're going exactly in the opposite direction: the fact is that, if std::getline successfully reads a line from the stream, then fs.good() would return true.

The documentation at cplusplus says about good() that,

The function returns true if none of the stream's error flags (eofbit, failbit and badbit) are set.

That is, when you attempt to read data from an input stream, and if the attempt was failure, only then a failure flag is set and good() returns false as an indication of the failure.

If you want to limit the scope of line variable to inside the loop only, then you can write a for loop as:

for(std::string line; std::getline(fs, line); )
{
   //use 'line'
}

Note: this solution came to my mind after reading @john's solution, but I think its better than his version.


Read a detail explanation here why the second one is preferable and idiomatic:

  • Linux | Segmentation Fault in C++ - Due to the function ifstream

Or read this nicely written blog by @Jerry Coffin:

  • Reading files

Solution 2:

Think of this as an extended comment to Nawaz' already excellent answer.

Regarding your first option,

while (fs.good()) {
  std::string line;
  std::getline(fs, line);
  ...

This has multiple problems. Problem number 1 as that that the while condition is in the wrong place and is superfluous. It's in the wrong place because fs.good() indicates whether or not the most recent action performed on the file was OK. A while condition should be with respect to the upcoming actions, not the previous ones. There is no way to know whether the upcoming action on the file will be OK. What upcoming action? fs.good() does not read your code to see what that upcoming action is.

Problem number two is that the you are ignoring the return status from std::getline(). That's OK if you immediately check the status with fs.good(). So, fixing this up a bit,

while (true) {
  std::string line;
  if (std::getline(fs, line)) {
    ...
  }
  else {
     break;
  }
}

Alternatively, you can do if (! std::getline(fs, line)) { break; } but now you have a break in the middle of the loop. Yech. It is much, much better to make the exit conditions a part of the loop statement itself if at all possible.

Compare that to

std::string line;
while (std::getline(fs, line)) {
  ...
}

This is the standard idiom for reading lines from a file. A very similar idiom exists in C. This idiom is very old, very widely used, and very widely viewed as the correct way to read lines from a file.

What if you come from a shop that bans conditionals with side-effects? (There are lots and lots of programming standards that do just that.) There is a way around this without resorting to the break in the middle of the loop approach:

std::string line;
for (std::getline(fs, line); fs.good(); std::getline(fs, line)) {
  ...
}

Not as ugly as the break approach, but most will agree that this isn't nearly as nice-looking as is the standard idiom.

My recommendation is to use the standard idiom unless some standards idiot has banned its use.

Addendum
Regarding for (std::getline(fs, line); fs.good(); std::getline(fs, line)): This is ugly for two reasons. One is that obvious chunk of replicated code.

Less obvious is that calling getline and then good breaks atomicity. What if some other thread is also reading from the file? This isn't quite so important right now because C++ I/O currently is not threadsafe. It will be in the upcoming C++11. Breaking atomicity just to keep the enforcers of the standards happy is recipe for disaster.