kernel reported iSCSI connection 1:0 error (1022-Invalid or unknown error code) state (3) Fed 25/VessRaid/rsnapshot

I got a couple of replies from one of the Linux kernel maintainers. This does seem to be an issue with 32-bit kernels and low memory pressure. Here are some comments (after I provided some more logs and tracepoints):

all those are from the kswapd (background memory reclaim). Which means that it doesn't catch any allocation which can stall for too long. Anyway the above tracepoint show that we are able to make some progress during the reclaim (nr_reclaimed > 0). So I suspect that this is indeed a large lowmem pressure and I do not see what we can do about that.

as well as:

and this one is hitting the min watermark while there is not really much to reclaim. Only the page cache which might be pinned and not reclaimable from this context because this is GFP_NOFS request. It is not all that surprising the reclaim context fights to get some memory. There is a huge amount of the reclaimable slab which probably just makes a slow progress.

That is not something completely surprising on 32b system I am afraid.

It seems it's not something that will get fixed any time soon, if ever. FWIW I have these settings in /etc/sysctl.conf

# Disable TCP window scaling
net.ipv4.tcp_window_scaling=0
# tip from http://serverfault.com/questions/235965/why-would-a-server-not-send-a-syn-ack-packet-in-response-to-a-syn-packet
net.ipv4.tcp_timestamps=0
net.ipv4.tcp_tw_recycle=0
# vm.zone_reclaim_mode=1 CONFIG_NUMA needs to be enabled in kernel configuration for this setting
vm.min_free_kbytes=131072

#tip from https://www.spinics.net/lists/kernel/msg2403670.html
# write-cache, foreground/background flushing
vm.dirty_ratio = 3

# vm.dirty_background_ratio = 5 (% of RAM)
vm.dirty_background_ratio = 1

# make it 10 sec
vm.dirty_writeback_centisecs = 1000

# http://serverfault.com/questions/696156/kswapd-often-uses-100-cpu-when-swap-is-in-use
vm.swappiness=25
vm.vfs_cache_pressure=1000