Calling sync/fsync slows IO after 30 minutes uptime

Solution 1:

This was caused by SMART data being enabled for the drive in question.

Disabling SMART data solved this :

sudo smartctl --smart=off /dev/sda

Interestingly re-enabling SMART data for the drive does not make the issue return which suggests to me that SMART was in an inconsistent state (possible crash whilst the self-tests were running?) and switching it off and then on again reset that state.

Presumably it kept rerunning some kind of internal self-test 30 minutes after the disk spun up and got into a loop; as this was at the hardware layer the rest of the computer was unaware of it going on hence I could see no process in particular responsible for IO blocking and no processes hogging resources.

I'd run the SMART self tests whilst trying to work out what was wrong but even that didn't reset the state - it had to be switched off and then on explicitly.

Solution 2:

This issue persists after reboots; for example - if I wait 30 minutes for the slowdown then reboot, the slowdown will still be there. If I powerdown then reboot the issue disappears until 30 minutes later.

This indicates that there is a firmware bug in the SSD itself that appears after 30 minutes of being powered.