Can I do a copy-on-write memcpy in Linux?

I have some code where I frequently copy a large block of memory, often after making only very small changes to it.

I have implemented a system which tracks the changes, but I thought it might be nice, if possible to tell the OS to do a 'copy-on-write' of the memory, and let it deal with only making a copy of those parts which change. However while Linux does copy-on-write, for example when fork()ing, I can't find a way of controlling it and doing it myself.


Solution 1:

Your best chance is probably to mmap() the original data to file, and then mmap() the same file again using MAP_PRIVATE.

Solution 2:

Depending on what exactly it is that you are copying, a persistent data structure might be a solution for your problem.

Solution 3:

Its easier to implement copy-on-write in a object oriented language, like c++. For example, most of the container classes in Qt are copy-on-write.

But if course you can do that in C too, it's just some more work. When you want to assign your data to a new data block, you don't do a copy, instead you just copy a pointer in a wrapper strcut around your data. You need to keep track in your data blocks of the status of the data. If you now change something in your new data block, you make a "real" copy and change the status. You can't of course no longer use the simple operators like "=" for assignment, instead need to have functions (In C++ you would just do operator overloading).

A more robust implementation should use reference counters instead of a simple flag, I leave it up to you.

A quick and dirty example: If you have a

struct big {
//lots of data
    int data[BIG_NUMBER];
}

you have to implement assign functions and getters/setters yourself.

// assume you want to implent cow for a struct big of some kind
// now instead of
struct big a, b;
a = b;
a.data[12345] = 6789;

// you need to use
struct cow_big a,b;
assign(&a, b);   //only pointers get copied
set_some_data(a, 12345, 6789); // now the stuff gets really copied


//the basic implementation could look like 
struct cow_big {
    struct big *data;
    int needs_copy;
}

// shallow copy, only sets a pointer. 
void assign(struct cow_big* dst, struct cow_big src) {
    dst->data = src.data;
    dst->needs_copy = true;
}

// change some data in struct big. if it hasn't made a deep copy yet, do it here.
void set_some_data(struct cow_big* dst, int index, int data } {
    if (dst->needs_copy) {
        struct big* src = dst->data;
        dst->data = malloc(sizeof(big));
        *(dst->data) = src->data;   // now here is the deep copy
       dst->needs_copy = false;
   }
   dst->data[index] = data;
}

You need to write constructors and destructors as well. I really recommend c++ for this.

Solution 4:

The copy-on-write mechanism employed e.g. by fork() is a feature of the MMU (Memory Management Unit), which handles the memory paging for the kernel. Accessing the MMU is a priviledged operation, i.e. cannot be done by a userspace application. I am not aware of any copy-on-write API exported to user-space, either.

(Then again, I am not exactly a guru on the Linux API, so others might point out relevant API calls I have missed.)

Edit: And lo, MSalters rises to the occasion. ;-)

Solution 5:

You should be able to open your own memory via /proc/$PID/mem and then mmap() the interesting part of it with MAP_PRIVATE to some other place.