ZFS cksum errors on LSI 9207-9i (SAS2308) with Samsung 850 Pro SSDs
I am testing an LSI 9207-8i controller with 8x Samsung 850 Pro 256GB SSDs attached. SSDs are running latest firmware EXM02B6Q, controller is running P17 and has exhibited same issues with P19. Server RAM is ECC and have been testing in mirrored mode.
I have tested with ZFS-On-Linux and FreeBSD, and have tried LSI's driver on both operating systems.
Disks behave as expected, but during heavy IO they appear to be writing bad blocks. When running a scrub on the disks, checksum errors appear. In order to simulate heavy IO, I am using a recordsize of 16k with primarycache=metadata and secondarycache=none. I generate a 4gb random file and dd this to another file in 4 threads. Looping this a few times is enough for a scrub to show checksum errors.
Yet to confirm if this is an issue with the controller, SSDs or cables. I am suspecting the SSDs, but will be testing with a 9211-8i at the next opportunity.
Has anyone experienced a similar issue, or does anyone have any suggestions on what to do next - beyond replacing controller/SSDs?
Update: Have tested another Samsung 850 Pro 256GB with EXM01B6Q firmware on an entirely different server, using the onboard SATA controller. Same issue occurs with checksums.
I've had this problem in the past with Samsung 850 Evo's as well. The drives present themselves as 512K aligned in OmniOS/OpenSol, which because it lacks the ashift param, you get this issue. It appears to be some kind of garbage collection issue on the disks themselves, I'd write a ton of data, scrub-- and see errors.
We ended up forcing the disks to present as 4K aligned in sd.conf, and ZFS then started behaving properly.
I thought I'd bring this up incase someone else hits the same problem.
I have managed to resolve the problem by setting ashift=12 (4k alignment) when creating the pool.