How to parse space-separated floats in C++ quickly?

I have a file with millions of lines, each line has 3 floats separated by spaces. It takes a lot of time to read the file, so I tried to read them using memory mapped files only to find out that the problem is not with the speed of IO but with the speed of the parsing.

My current parsing is to take the stream (called file) and do the following

float x,y,z;
file >> x >> y >> z;

Someone in Stack Overflow recommended to use Boost.Spirit but I couldn't find any simple tutorial to explain how to use it.

I'm trying to find a simple and efficient way to parse a line that looks like this:

"134.32 3545.87 3425"

I will really appreciate some help. I wanted to use strtok to split it, but I don't know how to convert strings to floats, and I'm not quite sure it's the best way.

I don't mind if the solution will be Boost or not. I don't mind if it won't be the most efficient solution ever, but I'm sure that it is possible to double the speed.

Thanks in advance.


UPDATE

Since Spirit X3 is available for testing, I've updated the benchmarks. Meanwhile I've used Nonius to get statistically sound benchmarks.

All charts below are available interactive online

Benchmark CMake project + testdata used is on github: https://github.com/sehe/bench_float_parsing

enter image description here

Summary:

Spirit parsers are fastest. If you can use C++14 consider the experimental version Spirit X3:

enter image description here

The above is measures using memory mapped files. Using IOstreams, it will be slower accross the board,

enter image description here

but not as slow as scanf using C/POSIX FILE* function calls:

enter image description here


What follows is parts from the OLD answer


I implemented the Spirit version, and ran a benchmark comparing to the other suggested answers.

Here's my results, all tests run on the same body of input (515Mb of input.txt). See below for exact specs.


(wall clock time in seconds, average of 2+ runs)

To my own surprise, Boost Spirit turns out to be fastest, and most elegant:

  • handles/reports errors
  • supports +/-Inf and NaN and variable whitespace
  • no problems at all detecting the end of input (as opposed to the other mmap answer)
  • looks nice:

    bool ok = phrase_parse(f,l,               // source iterators
         (double_ > double_ > double_) % eol, // grammar
         blank,                               // skipper
         data);                               // output attribute
    

Note that boost::spirit::istreambuf_iterator was unspeakably much slower (15s+). I hope this helps!

Benchmark details

All parsing done into vector of struct float3 { float x,y,z; }.

Generate input file using

od -f -A none --width=12 /dev/urandom | head -n 11000000

This results in a 515Mb file containing data like

     -2627.0056   -1.967235e-12  -2.2784738e+33
  -1.0664798e-27  -4.6421956e-23   -6.917859e+20
  -1.1080849e+36   2.8909405e-33   1.7888695e-12
  -7.1663235e+33  -1.0840628e+36   1.5343362e-12
  -3.1773715e-17  -6.3655537e-22   -8.797282e+31
    9.781095e+19   1.7378472e-37        63825084
  -1.2139188e+09  -5.2464635e-05  -2.1235992e-38
   3.0109424e+08   5.3939846e+30  -6.6146894e-20

Compile the program using:

g++ -std=c++0x -g -O3 -isystem -march=native test.cpp -o test -lboost_filesystem -lboost_iostreams

Measure wall clock time using

time ./test < input.txt 

Environment:

  • Linux desktop 4.2.0-42-generic #49-Ubuntu SMP x86_64
  • Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
  • 32GiB RAM

Full Code

Full code to the old benchmark is in the edit history of this post, the newest version is on github


If the conversion is the bottle neck (which is quite possible), you should start by using the different possiblities in the standard. Logically, one would expect them to be very close, but practically, they aren't always:

  • You've already determined that std::ifstream is too slow.

  • Converting your memory mapped data to an std::istringstream is almost certainly not a good solution; you'll first have to create a string, which will copy all of the data.

  • Writing your own streambuf to read directly from the memory, without copying (or using the deprecated std::istrstream) might be a solution, although if the problem really is the conversion... this still uses the same conversion routines.

  • You can always try fscanf, or scanf on your memory mapped stream. Depending on the implementation, they might be faster than the various istream implementations.

  • Probably faster than any of these is to use strtod. No need to tokenize for this: strtod skips leading white space (including '\n'), and has an out parameter where it puts the address of the first character not read. The end condition is a bit tricky, your loop should probably look a bit like:

    char* begin;    //  Set to point to the mmap'ed data...
                    //  You'll also have to arrange for a '\0'
                    //  to follow the data.  This is probably
                    //  the most difficult issue.
    char* end;
    errno = 0;
    double tmp = strtod( begin, &end );
    while ( errno == 0 && end != begin ) {
        //  do whatever with tmp...
        begin = end;
        tmp = strtod( begin, &end );
    }

If none of these are fast enough, you'll have to consider the actual data. It probably has some sort of additional constraints, which means that you can potentially write a conversion routine which is faster than the more general ones; e.g. strtod has to handle both fixed and scientific, and it has to be 100% accurate even if there are 17 significant digits. It also has to be locale specific. All of this is added complexity, which means added code to execute. But beware: writing an efficient and correct conversion routine, even for a restricted set of input, is non-trivial; you really do have to know what you are doing.

EDIT:

Just out of curiosity, I've run some tests. In addition to the afore mentioned solutions, I wrote a simple custom converter, which only handles fixed point (no scientific), with at most five digits after the decimal, and the value before the decimal must fit in an int:

double
convert( char const* source, char const** endPtr )
{
    char* end;
    int left = strtol( source, &end, 10 );
    double results = left;
    if ( *end == '.' ) {
        char* start = end + 1;
        int right = strtol( start, &end, 10 );
        static double const fracMult[] 
            = { 0.0, 0.1, 0.01, 0.001, 0.0001, 0.00001 };
        results += right * fracMult[ end - start ];
    }
    if ( endPtr != nullptr ) {
        *endPtr = end;
    }
    return results;
}

(If you actually use this, you should definitely add some error handling. This was just knocked up quickly for experimental purposes, to read the test file I'd generated, and nothing else.)

The interface is exactly that of strtod, to simplify coding.

I ran the benchmarks in two environments (on different machines, so the absolute values of any times aren't relevant). I got the following results:

Under Windows 7, compiled with VC 11 (/O2):

Testing Using fstream directly (5 iterations)...
    6.3528e+006 microseconds per iteration
Testing Using fscan directly (5 iterations)...
    685800 microseconds per iteration
Testing Using strtod (5 iterations)...
    597000 microseconds per iteration
Testing Using manual (5 iterations)...
    269600 microseconds per iteration

Under Linux 2.6.18, compiled with g++ 4.4.2 (-O2, IIRC):

Testing Using fstream directly (5 iterations)...
    784000 microseconds per iteration
Testing Using fscanf directly (5 iterations)...
    526000 microseconds per iteration
Testing Using strtod (5 iterations)...
    382000 microseconds per iteration
Testing Using strtof (5 iterations)...
    360000 microseconds per iteration
Testing Using manual (5 iterations)...
    186000 microseconds per iteration

In all cases, I'm reading 554000 lines, each with 3 randomly generated floating point in the range [0...10000).

The most striking thing is the enormous difference between fstream and fscan under Windows (and the relatively small difference between fscan and strtod). The second thing is just how much the simple custom conversion function gains, on both platforms. The necessary error handling would slow it down a little, but the difference is still significant. I expected some improvement, since it doesn't handle a lot of things the the standard conversion routines do (like scientific format, very, very small numbers, Inf and NaN, i18n, etc.), but not this much.