fio 3.23 core dumps when bench-marking many small files

(TL;DR setting --alloc-size to have a big number helps)

I bet you can simplify this job down and still reproduce the problem (which will be helpful for whoever looks at this because there are less places to look). I'd guess the crux is that opendir option and the fact that you say the directory contains "2^20 1MiB files"...

If you read the documentation of --alloc-size you will notice it mentions:

If running large jobs with randommap enabled, fio can run out of memory.

By default fio evenly distributes random I/O across evenly across a file (each block is written once per pass) but to do so it needs to keep track of the areas it has written which means it has to keep a data structure per file. OK you can see where this is going...

Memory pools set aside for certain data structures (because they have to be shared between jobs). Initially there are 8 pools (https://github.com/axboe/fio/blob/fio-3.23/smalloc.c#L22 ) and by default each pool is 16 megabytes in size (https://github.com/axboe/fio/blob/fio-3.23/smalloc.c#L21 ).

Each file that does random I/O requires a data structure to go with it. Based on your output let's guess that each file forces an allocation a data structure of 368 bytes + header (https://github.com/axboe/fio/blob/fio-3.23/smalloc.c#L434 ), which combined comes to 388 bytes. Because the pool works in allocations of 32 bytes (https://github.com/axboe/fio/blob/fio-3.23/smalloc.c#L70 ) this means we actually take a bite of 13 blocks (416 bytes) out of a pool per file.

Out of curiosity I have the following questions:

  • Are you running this in a container?
  • What is the maximum size that your /tmp can be?

I don't think the above are germane to your issue but it would be good rule out.

Update: by default, docker limits the amount of IPC shared memory (also see its --shm-size option). It's unclear if it was a factor in this particular case but see the "original job only stopped at 8 pools" comment below.

So why didn't setting --alloc-size=776 help? Looking at what you wrote, it seems odd that your blocks per pool didn't increase, right? I notice your pools grew to the maximum of 16 (https://github.com/axboe/fio/blob/fio-3.23/smalloc.c#L24 ) the second time around. The documentation for --alloc-size says this:

--alloc-size=kb Allocate additional internal smalloc pools of size kb in KiB. [...] The pool size defaults to 16MiB. [emphasis added]

You used --alloc-size=776... isn't 776 KiB smaller than 16 MiB? That would make each pool smaller than the default and may explain why it tried to grow the number of pools to the maximum of 16 before giving up in your second run.

(2 ** 20 * 416) / 8 / 1024 = 53248 (but see the update below)

The above arithmetic suggests you want each pool to be approximately 52 megabytes in size if you are going to have 8 of them for a sum total of approximately 416 megabytes of RAM. What happens when you use --alloc-size=53248?

Update: the calculated number above was too low. In a comment the question asker reports that using a much higher setting of --alloc-size=1048576 was required.

(I'm a little concerned that the original job only stopped at 8 pools (128 MiB) though. Doesn't that suggest that trying to grow to a ninth 16 MiB pool was problematic?)

Finally, the fio documentation seems to be hinting these data structures are being allocated when you ask for a particular distribution of random I/O. This suggests that if the I/O is sequential or if the I/O is using random offsets but DOESN'T have to adhere to a distribution then maybe those data structures don't have to be allocated... What happens if you use norandommap ?

(Aside: blocksize=2M but your files are 1MiB big - is that correct?)

This question feels too big and specialist for a casual serverfault answer and may get a better answer from the fio project itself (see https://github.com/axboe/fio/blob/fio-3.23/REPORTING-BUGS , https://github.com/axboe/fio/blob/fio-3.23/README#L58 ).

Good luck!