SuperMicro GPU temperature-based fan control

Solution 1:

I can confirm the issue on Supermicro 2029GP-TR with Nvidia A100 GPU.

Initially, with Fan mode set to "Optimal", IPMI sensors at "Sensors reading" page on IPMI web GUI shows only "GPU1 Temp" sensor with 2 GPU installed. When testing GPUs with gpu-burn platform FANs changed speed according to GPU temperature only for GPU1, and stay on 3000 RPM for GPU2.

The problem was solved by updating IPMI firmware with 'Preserve configuration' and 'Preserve SDR' unchecked.

Solution 2:

I got it working!

In the past, I tried it on Optimal and I thought I heard a bit of fan speed ratcheting, but it wasn't enough. So I started looking for answers. I had updated IPMI but not BIOS in my quest. Today I kicked it into Optimal via IPMI raw commands and tested it again and ... now it works! It stabilizes right at 60C +/- 1C.

I had noticed in the past that without the NVidia drivers loaded, I don't think I saw GPU1 Temp engage.

So, I'm going to go with a combination of update IPMI and NVidia drivers as the probable cause for it working. I'm pleased to see the system handle this without needing to resort to a manual script, etc. to control this.