Buffered vs unbuffered IO
I learned that by default I/O in programs is buffered, i.e they are served from a temporary storage to the requesting program.
I understand that buffering improves IO performance (maybe by reducing system calls). I have seen examples of disabling buffering, like setvbuf
in C. What is the difference between the two modes and when should one be used over the other?
Solution 1:
You want unbuffered output whenever you want to ensure that the output has been written before continuing. One example is standard error under a C runtime library - this is usually unbuffered by default. Since errors are (hopefully) infrequent, you want to know about them immediately. On the other hand, standard output is buffered simply because it's assumed there will be far more data going through it.
Another example is a logging library. If your log messages are held within buffers in your process, and your process dumps core, there a very good chance that output will never be written.
In addition, it's not just system calls that are minimized but disk I/O as well. Let's say a program reads a file one byte at a time. With unbuffered input, you will go out to the (relatively very slow) disk for every byte even though it probably has to read in a whole block anyway (the disk hardware itself may have buffers but you're still going out to the disk controller which is going to be slower than in-memory access).
By buffering, the whole block is read in to the buffer at once then the individual bytes are delivered to you from the (in-memory, incredibly fast) buffer area.
Keep in mind that buffering can take many forms, such as in the following example:
+-------------------+-------------------+
| Process A | Process B |
+-------------------+-------------------+
| C runtime library | C runtime library | C RTL buffers
+-------------------+-------------------+
| OS caches | Operating system buffers
+---------------------------------------+
| Disk controller hardware cache | Disk hardware buffers
+---------------------------------------+
| Disk |
+---------------------------------------+
Solution 2:
You want unbuffered output when you already have large sequence of bytes ready to write to disk, and want to avoid an extra copy into a second buffer in the middle.
Buffered output streams will accumulate write results into an intermediate buffer, sending it to the OS file system only when enough data has accumulated (or flush()
is requested). This reduces the number of file system calls. Since file system calls can be expensive on most platforms (compared to short memcpy
), buffered output is a net win when performing a large number of small writes. Unbuffered output is generally better when you already have large buffers to send -- copying to an intermediate buffer will not reduce the number of OS calls further, and introduces additional work.
Unbuffered output has nothing to do with ensuring your data reaches the disk; that functionality is provided by flush()
, and works on both buffered and unbuffered streams. Unbuffered IO writes don't guarantee the data has reached the physical disk -- the OS file system is free to hold on to a copy of your data indefinitely, never writing it to disk, if it wants. It is only required to commit it to disk when you invoke flush()
. (Note that close()
will call flush()
on your behalf).