Still getting aacraid: Host adapter abort request errors after following recommended steps

Solution 1:

In case you didn't resolve this yet, I recently wrestled with the same issue which quickly escalated to array hanging every 5 minutes for couple minutes as the IO increased. Ubuntu by default uses CFQ scheduler which isn't optimal for hardware RAID. Switch the scheduler to noop with:

echo noop > /sys/block/<blockdevice>/queue/scheduler

Personally I'm stuck with old kernel but I've been told also upgrading to latest aacraid driver should fix the issue - can't verify that though. But even so, switch to noop. Since sysfs isn't permanent so you might want to set the scheduler in /etc/rc.local or use the elevator= boot parameter.

I'd pay attention to other kernel parameters as well as settings on Ubuntu are reasonable defaults for most common hardware but most of the time servers do need special attention regardless of distro you're on.

Solution 2:

If your Adaptec RAID controller has its own firmware/BIOS, you may need to update that. We had issues during high I/O and got "aacraid: Host adapter abort request" as well and saw a firmware release newer than our current one which said "Fixed an issue where the firmware could hang during high I/O stress." http://download.adaptec.com/pdfs/readme/relnotes_arc_fw-b18937_asm-18837.pdf.

The above release notes list the following Adaptec models: 2045, 2405, 2405Q, 2805, 5085, 5405, 5405Z, 5445, 5445Z, 5805, 5805Q, 5805Z, 5805ZQ, 51245, 51645, 52445).

We also got log lines like:

sd 0:0:0:0: timing out command, waited 360s

and

Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT,SUGGEST_OK

In searching online to see other people having a similar issue, we found another line of cards which has had the following issues fixed by firmware which could be relevant:

  • "Resolved an issue that could result in Host IO errors, RAID Volume state changes, non-responsive systems, and system reboots or resets in rare cases where extremely high IO loads are served almost entirely from controller cache" http://download.adaptec.com/pdfs/readme/relnotes_arc_fw-b30862_msm-20942.pdf
  • "Resolved an issue where I/O would slow and eventually result in a controller reset" http://download.adaptec.com/pdfs/readme/relnotes_arc_fw-b30612_msm-20618.pdf

The above two apply to Adaptec models 7805, 7805Q, 78165, 71605E, 71605, 71605Q, 71685, 72405, 8805, 8885, 8885Q, and 81605ZQ.