e2fsck on a low-memory machine: can I get more out of scratch_files or swap?

Solution 1:

I am not an e2fsck expert. I assume that e2fsck does care whether the memory it sees is real RAM or swap. Pages can be locked into memory. I assume that the information how much memory is locked is available via /proc or ps, top,... You may monitor this value.

Obviously the only good solution would be to connect the disk to better hardware. Difficult for you. But it may even help not to make this connection physically but via network. If there is another Linux system with a suitable LAN connection to yours and with more RAM then you could export the device to be checked as a network block device. Probably still faster than my next idea.

If the problem is that e2fsck requires "real" RAM then you could create a virtual machine with a tiny Linux installation (nothing more needed than e2fsck...). This VM could be configured with 2, 4, 16 GiB of "RAM". The device to be checked can be exported as a block device (appearing as a disk in the VM). It probably makes sense to use the scratch_files feature anyway. This would obviously be a performance nightmare but I guess you have accepted already that any possible solution in in this category.

Edit 1

You can see the amount of virtual memory a process has locked into RAM by:

grep VmLck /proc/$PID/status

Edit 2

Here's everything from dmesg related to device sdb. The errors for EXT4-fs are the reason I was running e2fsck in the first place.

sd 0:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
sd 0:0:0:0: [sdb] Write Protect is off
sd 0:0:0:0: [sdb] Mode Sense: 28 00 00 00
sd 0:0:0:0: [sdb] Assuming drive cache: write through
sd 0:0:0:0: [sdb] Assuming drive cache: write through
 sdb: sdb1
sd 0:0:0:0: [sdb] Assuming drive cache: write through
sd 0:0:0:0: [sdb] Attached SCSI disk
EXT4-fs (sdb1): barriers disabled
EXT4-fs (sdb1): warning: mounting fs with errors, running e2fsck is recommended
EXT4-fs (sdb1): recovery complete
EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: 
SELinux: initialized (dev sdb1, type ext4), uses xattr
EXT4-fs error (device sdb1): ext4_lookup: deleted inode referenced: 46006273
EXT4-fs (sdb1): ext4_check_descriptors: Checksum for group 0 failed (2332!=0)
EXT4-fs (sdb1): group descriptors corrupted!
EXT4-fs (sdb1): ext4_check_descriptors: Checksum for group 0 failed (34754!=0)
EXT4-fs (sdb1): group descriptors corrupted!
EXT4-fs (sdb1): ext4_check_descriptors: Checksum for group 0 failed (34754!=0)
EXT4-fs (sdb1): group descriptors corrupted!

Solution 2:

We can tell that e2fsck is trying to allocate 2.5gb for whatever table he's trying to come up with, and even though you do have enough (virtual) ram available you don't have address space for that in a 32-bit process.

Well, you do, but you're asking for 5/6ths of it at once, odds are other mappings/allocations are taking the remaining 500mb of address space before, hence the kernel can't spot a contiguous 2.5gb space to satisfy that mmap2.

My advice is: Try running a USB bootable 64-bit Linux, do make use that same 20gb swapfile you have (or have at least 4gb of swap handy), you're already aware this might take ages to complete.

On a side note: I've downloaded e2fsprogs source code to determine whether e2fsck could be requesting real ram by calling mlock() or mlockall(), but grepping "mlock" recursively yield no results, so this path seems unlikely.

I can't comment on posts (I'm new here), please like my answer if useful so that I can earn the reputation points serverfault's requiring to let me comment on posts.

Last, but not least: You can strace all memory related calls with strace -e memory e2fsck..