Avoid memory fragmentation when memory pools are a bad idea

If everything really is/works as you say it does and there is no bug you have not yet found, then try this:

malloc and other memory allocation usually uses chunks of 16 bytes anyway, even if the actual requested size is smaller than 16 bytes. So you only need 4000/16 - 30/16 ~ 250 different memory pools.

const int chunk_size = 16;

memory_pools pool[250]; // 250 memory pools, managing '(idx+1)*chunk_size' size

char* reserve_mem(size_t sz)
{
    size_t pool_idx_to_use = sz/chunk_size;
    char * rc=pool[pool_idx_to_use].allocate();
}

IOW, you have 250 memory pools. pool[0] allocates and manages chunk with a length of 16 bytes. pool[100] manages chunks with 1600 bytes etc...

If you know the length distribution of your strings in advance, you can reserve initial memory for the pools based on this knowledge. Otherwise I'd just probably reserve memory for the pools in 4096 bytes increment.

Because while the malloc C heap usually allocates memory in multiple of 16 bytes it will (at least unter Windows, but I am guessing, Linux is similar here) ask the OS for memory - which usually works with 4K pages. IOW, the "outer" memory heap managed by the operating system reserves and frees 4096 bytes.

So increasing your own internal memory pool in 4096 bytes means no fragmentation in the OS app heap. This 4096 page size (or multiple of...) comes from the processor architecture. Intel processors have a builtin page size of 4K (or multiple of). Don't know about other processors, but I suspect similar architectures there.

So, to sum it up:

Use chunks of multiple of 16 bytes for your strings per memory pool. Use chunks of multiple of 4K bytes to increase your memory pool.

That will align the memory use of your application with the memory management of the OS and avoid fragmentation as much as possible.

From the OS point of view, your application will only increment memory in 4K chunks. That's very easy to allocate and release. And there is no fragmentation.

From the internal (lib) C heap management point of view, your application will use memory pools and waste at most 15 bytes per string. Also all similar length allocations will be heaped together, so also no fragmentation.

Here a solution, which might help, if your application has the following traits:

Runs on Windows
Has episodes, where the working set is large, but all that data is released at around the same point in time entirely. (If the data is processed in some kind of batch mode and the work is done when its done and then the related data is freed).

This approach uses a (unique?) Windows feature, called custom heaps. Possibly, you can create yourself something similar on other OS.

The functions, you need are in a header called <heapapi.h>.

Here is how it would look like:

At the start of a memory intensive phase of your program, use HeapCreate() to have a heap, where all your data will go.
Perform the memory intensive tasks.
At the end of the memory intensive phase, Free your data and call HeapDestroy().

Depending on the detailed behavior of your application (e.g. whether or not this memory intensive computation runs in 1 or multiple threads), you can configure the heap accordingly and possibly even gain a little speed (If only 1 thread uses the data, you can give HeapCreate() the HEAP_NO_SERIALIZE flag, so it would not take a lock.) You can also give an upper bound of what the heap will allow to store.

And once, your computation is complete, destroying the heap also prevents long term fragmentation, because when the time comes for the next computation phase, you start with a fresh heap.

Here is the documentation of the heap api.

In fact, I found this feature so useful, that I replicated it on FreeBSD and Linux for an application I ported, using virtual memory functions and my own heap implementation. Was a few years back and I do not have the rights for that piece of code, so I cannot show or share.

You could also combine this approach with fixed element size pools, using one heap for one dedicated block size and then expect less/no fragmentation within each of those heaps (because all blocks have same size).

struct EcoString {
  size_t heapIndex;
  size_t capacity;
  char* data;
  char* get() { return data; }
};

struct FixedSizeHeap {
  const size_t SIZE;
  HANDLE m_heap;
  explicit FixedSizeHeap(size_t size) 
    : SIZE(size)
    , m_heap(HeapCreate(0,0,0)
  {
  }
  ~FixedSizeHeap() {
     HeapDestroy(m_heap);
  }
  bool allocString(capacity, EcoString& str) {
    assert(capacity <= SIZE);
    str.capacity = SIZE; // we alloc SIZE bytes anyway...
    str.data = (char*)HeapAlloc(m_heap, 0, SIZE);
    if (nullptr != str.data)
      return true;
    str.capacity = 0;
    return false;
  }
  void freeString(EcoString& str) {
     HeapFree(m_heap, 0, str.data);
     str.data = nullptr;
     str.capacity = 0;
  }
};

struct BucketHeap {
  using Buckets = std::vector<FixedSizeHeap>; // or std::array
  Buckets buckets;
/*
(loop for i from 0
           for size = 40 then (+ size (ash size -1))
           while (< size 80000)
           collecting (cons i size))
((0 . 40) (1 . 60) (2 . 90) (3 . 135) (4 . 202) (5 . 303) (6 . 454) (7 . 681)
 (8 . 1021) (9 . 1531) (10 . 2296) (11 . 3444) (12 . 5166) (13 . 7749)
 (14 . 11623) (15 . 17434) (16 . 26151) (17 . 39226) (18 . 58839))
  Init Buckets with index (first item) with SIZE (rest item) 
in the above item list.
*/
 // allocate(nbytes) looks for the correct bucket (linear or binary 
 // search) and calls allocString() on that bucket. And stores the 
 // buckets index in the EcoString.heapIndex field (so free has an 
 // easier time).
 // free(EcoString& str) uses heapIndex to find the right bucket and
 // then calls free on that bucket.
}

Avoid memory fragmentation when memory pools are a bad idea

Related

Recent Posts