How do I read a large CSV file with Scala Stream class?

How do I read a large CSV file (> 1 Gb) with a Scala Stream? Do you have a code example? Or would you use a different way to read a large CSV file without loading it into memory first?


Solution 1:

Just use Source.fromFile(...).getLines as you already stated.

That returns an Iterator, which is already lazy (You'd use stream as a lazy collection where you wanted previously retrieved values to be memoized, so you can read them again)

If you're getting memory problems, then the problem will lie in what you're doing after getLines. Any operation like toList, which forces a strict collection, will cause the problem.

Solution 2:

I hope you don't mean Scala's collection.immutable.Stream with Stream. This is not what you want. Stream is lazy, but does memoization.

I don't know what you plan to do, but just reading the file line-by-line should work very well without using high amounts of memory.

getLines should evaluate lazily and should not crash (as long as your file does not have more than 2³² lines, afaik). If it does, ask on #scala or file a bug ticket (or do both).