Xen DomU root filesystems becoming read-only on iSCSI virtual IP failover

I eventually solved this by using the following advice and settings from the open-iscsi documentation:

8.2 iSCSI settings for iSCSI root
---------------------------------

When accessing the root parition directly through a iSCSI disk, the
iSCSI timers should be set so that iSCSI layer has several chances to try to
re-establish a session and so that commands are not quickly requeued to
the SCSI layer. Basically you want the opposite of when using dm-multipath.

For this setup, you can turn off iSCSI pings by setting:

node.conn[0].timeo.noop_out_interval = 0
node.conn[0].timeo.noop_out_timeout = 0

And you can turn the replacement_timer to a very long value:

node.session.timeo.replacement_timeout = 86400

After setting up the connection to each LUN as described above, the failover works like a charm, even if it takes several minutes to happen.


This sounds like a problem with the iSCSI initiator running on the dom0. The initiator should not be sending SCSI failures up the stack that quickly. You'll probably want to set ConnFailTimeout in iscsi.conf this is the setting that determines how long before it considers a connection failure an error and sends that error up the SCSI stack.

I'd also look into how long that failover is actually taking, it may be taking longer than you expect. If so maybe the VIP failover is taking too long due to ARP related issues.