Using tmpfs + a very large swap partition for /tmp instead of a regular filesystem?

I have Linux server and I have a spare 500GB disk partition. I wanted to format it and use it for /tmp. The server occasionally runs some large data processing tasks so it can happen that /tmp will hold GBs of temporary data.

Then I got an idea that instead I could add it as a swap partition instead and mount /tmp to tmpfs. Is this idea reasonable?

The server has 6GB of RAM, so in most cases data on /tmp would be only in RAM, with the obvious speed advantage. The question is, what if there will be let's say 10-20GB of data on /tmp, how will the system perform? What would be the performance compared to having simply /tmp mounted to an ext4 partition? Thanks for help.

Edit: It is clear that the system will start swapping out memory when the usage of tmpfs hits the RAM limit. But is Linux smart enough to swap out tmpfs data and keep "regular" data in RAM? If yes, then I suppose it could behave reasonably. If not, then the whole system will be severely affected.

This is NOT A Good Idea^TM.

You'll be fine with a large /tmp partition, mounted like this (from your /etc/fstab)

tmpfs  /dev/tmp  tmpfs  defaults,nosuid,nodev,noexec,noatime,nodiratime,size=6000M 0 0

And you could add your external drive as a giant swap partition

/dev/sdb1  swap  swap  defaults  0 0

When that hits its limit, your machine will start to swap the pages from RAM to disk - at which point, load averages will go through the roof and the machine will grind to a halt.

Its a bad idea to rely on SWAP in any way, you'd be better off selling your 500GB drive and simply buying more RAM - its cheap.

In summary

If you really want to use your 500GB disk, you could mount your 500GB disk on /tmp with a non-journaled filesystem with atime and diratime disabled (eg. ext2). That would be substantially faster than dealing with a machine that is SWAPing

This could be a reasonable idea.

Putting an actual filesystem on /tmp does incur overheads, because filesystems go through great lengths to make sure that the data on disk is not corrupted in case of system failure. For a /tmp that is cleaned at boot time, that is obviously just overhead. Using a tmpfs would avoid that overhead.

On the other hand, filesystems also make sure that files are organised on the disk in a way that optimises access time - i.e., they will avoid fragmentation. Typical sequential file accesses will (mostly) result in sequential disk accesses, which are more efficient than random accesses. This effect is more pronounced on spinning harddisks than on SSD. The swap+tmpfs combination can't easily do this, because swap is not aware of which piece of memory belongs to which file and tmpfs isn't aware of how pages are mapped to physical memory or to the disk. For large files, however, it should work well, since both tmpfs and swap try to keep things contiguous in that case. At least, as long as there is a lot of free space on swap (otherwise fragmentation kicks in), and writes happen slowly enough that they get a chance of being swapped out.

So the bottom line is: it depends, you should try both options to see which one works best.

When you mount the tmpfs, do remember to set the size explicitly. The default is half the physical RAM, so just 3GB.

This is actually a good idea when you usually don't have much data in /tmp, but occasionally consume endless gigabytes for limited duration. The problem is the linux swap system doesn't know enough about your use case to do it right. It will generally prioritize dumping or swapping cache over program pages, but that doesn't really help. It may be possible to use cgroups to achieve your goal, it is when the scratch data is held in program memory, but I'm not sure how to configure cgroups in this case (I suppose you could use a FUSE tmpfs...). Fortunately, that's not required. You can get the desired behaviour with zram and a backing device.

zram-init is the program that automates setting up zram, which is a compressed ram block device. There is usually an example in the zram-init config for mounting /tmp as zram. It'll be something like the following

type0=/tmp
flag0= 
size0=524288 # 500G of logical space
mlim0=2G # 2G of memory
back0=/dev/loop0 # (or /dev/sdxN, your large slow drive)
notr0= 
maxs0=4 # maximum number of parallel processes for this device
algo0=zstd 
labl0=tmp # the label name
uuid0= 
args0=

This will compress and store in memory anything written to /tmp. Usual compression is somewhere around 50%. It will consume at most 2G of physical memory. If it runs low on physical memory, it will take the oldest files and push them into the backing device, still compressed. Note that it does incur some CPU overhead to compress and decompress the files, but this is usually offset by the reduced IO.

A similar setup can be used in conjunction with cgroups to let certain processes swap without adversely affecting overall system performance.

Using tmpfs + a very large swap partition for /tmp instead of a regular filesystem?

Related

Recent Posts