How to read huge file in c++
If I have a huge file (eg. 1TB, or any size that does not fit into RAM. The file is stored on the disk). It is delimited by space. And my RAM is only 8GB. Can I read that file in ifstream? If not, how to read a block of file (eg. 4GB)?
There are a couple of things that you can do.
First, there's no problem opening a file that is larger than the amount of RAM that you have. What you won't be able to do is copy the whole file live into your memory. The best thing would be for you to find a way to read just a few chunks at a time and process them. You can use ifstream
for that purpose (with ifstream.read
, for instance). Allocate, say, one megabyte of memory, read the first megabyte of that file into it, rinse and repeat:
ifstream bigFile("mybigfile.dat");
constexpr size_t bufferSize = 1024 * 1024;
unique_ptr<char[]> buffer(new char[bufferSize]);
while (bigFile)
{
bigFile.read(buffer.get(), bufferSize);
// process data in buffer
}
Another solution is to map the file to memory. Most operating systems will allow you to map a file to memory even if it is larger than the physical amount of memory that you have. This works because the operating system knows that each memory page associated with the file can be mapped and unmapped on-demand: when your program needs a specific page, the OS will read it from the file into your process's memory and swap out a page that hasn't been used in a while.
However, this can only work if the file is smaller than the maximum amount of memory that your process can theoretically use. This isn't an issue with a 1TB file in a 64-bit process, but it wouldn't work in a 32-bit process.
Also be aware of the spirits that you're summoning. Memory-mapping a file is not the same thing as reading from it. If the file is suddenly truncated from another program, your program is likely to crash. If you modify the data, it's possible that you will run out of memory if you can't save back to the disk. Also, your operating system's algorithm for paging in and out memory may not behave in a way that advantages you significantly. Because of these uncertainties, I would consider mapping the file only if reading it in chunks using the first solution cannot work.
On Linux/OS X, you would use mmap
for it. On Windows, you would open a file and then use CreateFileMapping
then MapViewOfFile
.
I am sure you don't have to keep all the file in memory. Typically one wants to read and process file by chunks. If you want to use ifstream
, you can do something like that:
ifstream is("/path/to/file");
char buf[4096];
do {
is.read(buf, sizeof(buf));
process_chunk(buf, is.gcount());
} while(is);
A more advances aproach is to instead of reading whole file or its chunks to memory you can map it to memory using platform specific apis:
Under windows: CreateFileMapping(), MapViewOfFile()
Under linux: open(2) / creat(2), shm_open, mmap
you will need to compile 64bit app to make it work.
for more details see here: CreateFileMapping, MapViewOfFile, how to avoid holding up the system memory
You can use fread
char buffer[size];
fread(buffer, size, sizeof(char), fp);
Or, if you want to use C++ fstreams you can use read as buratino said.
Also have in mind that you can open a file regardless of its size, the idea is to open it and read it in chucks that fit in your RAM.