Possible to share in-memory data between 2 separate processes?

I have an xmlrpc server using Twisted. The server has a huge amount of data stored in-memory. Is it possible to have a secondary, separate xmlrpc server running which can access the object in-memory in the first server?

So, serverA starts up and creates an object. serverB starts up and can read from the object in serverA.

* EDIT *

The data to be shared is a list of 1 million tuples.


Without some deep and dark rewriting of the Python core runtime (to allow forcing of an allocator that uses a given segment of shared memory and ensures compatible addresses between disparate processes) there is no way to "share objects in memory" in any general sense. That list will hold a million addresses of tuples, each tuple made up of addresses of all of its items, and each of these addresses will have be assigned by pymalloc in a way that inevitably varies among processes and spreads all over the heap.

On just about every system except Windows, it's possible to spawn a subprocess that has essentially read-only access to objects in the parent process's space... as long as the parent process doesn't alter those objects, either. That's obtained with a call to os.fork(), that in practice "snapshots" all of the memory space of the current process and starts another simultaneous process on the copy/snapshot. On all modern operating systems, this is actually very fast thanks to a "copy on write" approach: the pages of virtual memory that are not altered by either process after the fork are not really copied (access to the same pages is instead shared); as soon as either process modifies any bit in a previously shared page, poof, that page is copied, and the page table modified, so the modifying process now has its own copy while the other process still sees the original one.

This extremely limited form of sharing can still be a lifesaver in some cases (although it's extremely limited: remember for example that adding a reference to a shared object counts as "altering" that object, due to reference counts, and so will force a page copy!)... except on Windows, of course, where it's not available. With this single exception (which I don't think will cover your use case), sharing of object graphs that include references/pointers to other objects is basically unfeasible -- and just about any objects set of interest in modern languages (including Python) falls under this classification.

In extreme (but sufficiently simple) cases one can obtain sharing by renouncing the native memory representation of such object graphs. For example, a list of a million tuples each with sixteen floats could actually be represented as a single block of 128 MB of shared memory -- all the 16M floats in double-precision IEEE representation laid end to end -- with a little shim on top to "make it look like" you're addressing things in the normal way (and, of course, the not-so-little-after-all shim would also have to take care of the extremely hairy inter-process synchronization problems that are certain to arise;-). It only gets hairier and more complicated from there.

Modern approaches to concurrency are more and more disdaining shared-anything approaches in favor of shared-nothing ones, where tasks communicate by message passing (even in multi-core systems using threading and shared address spaces, the synchronization issues and the performance hits the HW incurs in terms of caching, pipeline stalls, etc, when large areas of memory are actively modified by multiple cores at once, are pushing people away).

For example, the multiprocessing module in Python's standard library relies mostly on pickling and sending objects back and forth, not on sharing memory (surely not in a R/W way!-).

I realize this is not welcome news to the OP, but if he does need to put multiple processors to work, he'd better think in terms of having anything they must share reside in places where they can be accessed and modified by message passing -- a database, a memcache cluster, a dedicated process that does nothing but keep those data in memory and send and receive them on request, and other such message-passing-centric architectures.


mmap.mmap(0, 65536, 'GlobalSharedMemory')

I think the tag ("GlobalSharedMemory") must be the same for all processes wishing to share the same memory.

http://docs.python.org/library/mmap.html


There are a couple1 of third party libraries available for low-level shared memory manipulations in Python:

  • sysv_ipc
    • > For posix non-compliant systems
  • posix_ipc
    • > Works in Windows with cygwin

Both of which are available via pip

[1] Another package, shm, is available but deprecated. See this page for a comparison of the libraries.

Example Code for C to Python Communication c/o Martin O'Hanlon:

shmwriter.c

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>

int main(int argc, const char **argv)
{
   int shmid;
   // give your shared memory an id, anything will do
   key_t key = 123456;
   char *shared_memory;

   // Setup shared memory, 11 is the size
   if ((shmid = shmget(key, 11, IPC_CREAT | 0666)) < 0)
   {
      printf("Error getting shared memory id");
      exit(1);
   }
   // Attached shared memory
   if ((shared_memory = shmat(shmid, NULL, 0)) == (char *) -1)
   {
      printf("Error attaching shared memory id");
      exit(1);
   }
   // copy "hello world" to shared memory
   memcpy(shared_memory, "Hello World", sizeof("Hello World"));
   // sleep so there is enough time to run the reader!
   sleep(10);
   // Detach and remove shared memory
   shmdt(shmid);
   shmctl(shmid, IPC_RMID, NULL);
}

shmreader.py

import sysv_ipc

# Create shared memory object
memory = sysv_ipc.SharedMemory(123456)

# Read value from shared memory
memory_value = memory.read()

# Find the 'end' of the string and strip
i = memory_value.find('\0')
if i != -1:
    memory_value = memory_value[:i]

print memory_value

You could use shared_memory in 3.8.

https://docs.python.org/3.8/library/multiprocessing.shared_memory.html#module-multiprocessing.shared_memory


You could write a C library to create and manipulate shared-memory arrays for your specific purpose, and then use ctypes to access them from Python.

Or, put them on the filesystem in /dev/shm (which is tmpfs). You'd save a lot of development effort for very little performance overhead: reads/writes from a tmpfs filesystem are little more than a memcpy.