Understanding concurrent file writes from multiple processes

From here : Is file append atomic in UNIX

Consider a case where multiple processes open the same file and append to it. O_APPEND guarantees that seeking to the end of file and then beginning the write operation is atomic. So multiple processes can append to the same file and no process will overwrite any other processes' write as far as each write size is <= PIPE_BUF.

I wrote a test program where multiple processes open and write to the same file (write(2)). I make sure each write size is > PIPE_BUF (4k). I was expecting to see instances where a process overwrites someone else's data. But that doesnt happen. I tested with different write sizes. Is that just luck or is there a reason why that doesn't happen? My ultimate goal is to understand if multiple processes appending to the same file need to co-ordinate their writes.

Here is the complete program. Every process creates an int buffer, fills all values with its rank, opens a file and writes to it.

Specs: OpenMPI 1.4.3 on Opensuse 11.3 64-bit

Compiled as: mpicc -O3 test.c, run as: mpirun -np 8 ./a.out

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>

int 
main(int argc, char** argv) {
    int rank, size, i, bufsize = 134217728, fd, status = 0, bytes_written, tmp_bytes_written;
    int* buf;
    char* filename = "/tmp/testfile.out";

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    buf = (int*) malloc (bufsize * sizeof(int));   
    if(buf == NULL) {
        status = -1;
        perror("Could not malloc");
        goto finalize;
    }
    for(i=0; i<bufsize; i++) 
        buf[i] = rank;

    if(-1 == (fd = open(filename, O_APPEND|O_WRONLY, S_IWUSR))) {
        perror("Cant open file");
        status = -1;
        goto end;
        exit(-1);
    }

    bytes_written = 0;
    if(bufsize != (tmp_bytes_written = write(fd, buf, bufsize))) {
        perror("Error during write");
        printf("ret value: %d\n", tmp_bytes_written);
        status = -1;
        goto close;
    }

close:
    if(-1 == close(fd)) {
        perror("Error during close");
        status = -1;
    }
end:
    free(buf);
finalize:
    MPI_Finalize();
    return status;
}

Solution 1:

Atomicity of writes less than PIPE_BUF applies only to pipes and FIFOs. For file writes, POSIX says:

This volume of POSIX.1-2008 does not specify behavior of concurrent writes to a file from multiple processes. Applications should use some form of concurrency control.

...which means that you're on your own - different UNIX-likes will give different guarantees.

Solution 2:

Firstly, O_APPEND or the equivalent FILE_APPEND_DATA on Windows means that increments of the maximum file extent (file "length") are atomic under concurrent writers, and that is by any amount, not just PIPE_BUF. This is guaranteed by POSIX, and Linux, FreeBSD, OS X and Windows all implement it correctly. Samba also implements it correctly, NFS before v5 does not as it lacks the wire format capability to append atomically. So if you open your file with append-only, concurrent writes will not tear with respect to one another on any major OS unless NFS is involved.

This says nothing about whether reads will ever see a torn write though, and on that POSIX says the following about atomicity of read() and write() to regular files:

All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2008 when they operate on regular files or symbolic links ... [many functions] ... read() ... write() ... If two threads each call one of these functions, each call shall either see all of the specified effects of the other call, or none of them. [Source]

and

Writes can be serialized with respect to other reads and writes. If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write(), even if the calls are made by different processes. [Source]

but conversely:

This volume of POSIX.1-2008 does not specify behavior of concurrent writes to a file from multiple processes. Applications should use some form of concurrency control. [Source]

A safe interpretation of all three of these requirements would suggest that all writes overlapping an extent in the same file must be serialised with respect to one another and to reads such that torn writes never appear to readers.

A less safe, but still allowed interpretation could be that reads and writes only serialise with each other between threads inside the same process, and between processes writes are serialised with respect to reads only (i.e. there is sequentially consistent i/o ordering between threads in a process, but between processes i/o is only acquire-release).

Of course, just because the standard requires these semantics doesn't mean implementations comply, though in fact FreeBSD with ZFS behaves perfectly, very recent Windows (10.0.14393) with NTFS behaves perfectly, and recent Linuxes with ext4 behaves correctly if O_DIRECT is on. If you would like more detail on how well major OS and filing systems comply with the standard, see this answer