C Programming: How to read the whole file contents into a buffer [duplicate]
Portability between Linux and Windows is a big headache, since Linux is a POSIX-conformant system with - generally - a proper, high quality toolchain for C, whereas Windows doesn't even provide a lot of functions in the C standard library.
However, if you want to stick to the standard, you can write something like this:
#include <stdio.h>
#include <stdlib.h>
FILE *f = fopen("textfile.txt", "rb");
fseek(f, 0, SEEK_END);
long fsize = ftell(f);
fseek(f, 0, SEEK_SET); /* same as rewind(f); */
char *string = malloc(fsize + 1);
fread(string, fsize, 1, f);
fclose(f);
string[fsize] = 0;
Here string
will contain the contents of the text file as a properly 0-terminated C string. This code is just standard C, it's not POSIX-specific (although that it doesn't guarantee it will work/compile on Windows...)
Here is what I would recommend.
It should conform to C89, and be completely portable. In particular, it works also on pipes and sockets on POSIXy systems.
The idea is that we read the input in large-ish chunks (READALL_CHUNK
), dynamically reallocating the buffer as we need it. We only use realloc()
, fread()
, ferror()
, and free()
:
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
/* Size of each input chunk to be
read and allocate for. */
#ifndef READALL_CHUNK
#define READALL_CHUNK 262144
#endif
#define READALL_OK 0 /* Success */
#define READALL_INVALID -1 /* Invalid parameters */
#define READALL_ERROR -2 /* Stream error */
#define READALL_TOOMUCH -3 /* Too much input */
#define READALL_NOMEM -4 /* Out of memory */
/* This function returns one of the READALL_ constants above.
If the return value is zero == READALL_OK, then:
(*dataptr) points to a dynamically allocated buffer, with
(*sizeptr) chars read from the file.
The buffer is allocated for one extra char, which is NUL,
and automatically appended after the data.
Initial values of (*dataptr) and (*sizeptr) are ignored.
*/
int readall(FILE *in, char **dataptr, size_t *sizeptr)
{
char *data = NULL, *temp;
size_t size = 0;
size_t used = 0;
size_t n;
/* None of the parameters can be NULL. */
if (in == NULL || dataptr == NULL || sizeptr == NULL)
return READALL_INVALID;
/* A read error already occurred? */
if (ferror(in))
return READALL_ERROR;
while (1) {
if (used + READALL_CHUNK + 1 > size) {
size = used + READALL_CHUNK + 1;
/* Overflow check. Some ANSI C compilers
may optimize this away, though. */
if (size <= used) {
free(data);
return READALL_TOOMUCH;
}
temp = realloc(data, size);
if (temp == NULL) {
free(data);
return READALL_NOMEM;
}
data = temp;
}
n = fread(data + used, 1, READALL_CHUNK, in);
if (n == 0)
break;
used += n;
}
if (ferror(in)) {
free(data);
return READALL_ERROR;
}
temp = realloc(data, used + 1);
if (temp == NULL) {
free(data);
return READALL_NOMEM;
}
data = temp;
data[used] = '\0';
*dataptr = data;
*sizeptr = used;
return READALL_OK;
}
Above, I've used a constant chunk size, READALL_CHUNK
== 262144 (256*1024
). This means that in the worst case, up to 262145 chars are wasted (allocated but not used), but only temporarily. At the end, the function reallocates the buffer to the optimal size. Also, this means that we do four reallocations per megabyte of data read.
The 262144-byte default in the code above is a conservative value; it works well for even old minilaptops and Raspberry Pis and most embedded devices with at least a few megabytes of RAM available for the process. Yet, it is not so small that it slows down the operation (due to many read calls, and many buffer reallocations) on most systems.
For desktop machines at this time (2017), I recommend a much larger READALL_CHUNK
, perhaps #define READALL_CHUNK 2097152
(2 MiB).
Because the definition of READALL_CHUNK
is guarded (i.e., it is defined only if it is at that point in the code still undefined), you can override the default value at compile time, by using (in most C compilers) -DREADALL_CHUNK=2097152
command-line option -- but do check your compiler options for defining a preprocessor macro using command-line options.