GPU server freezes during GPU idling

Solution 1:

It should not be necessary to start a GUI session (or even have one installed!) to change settings such as this; nvidia-settings should work fine from the framebuffer console or even in a script you write that runs at startup.

Check to be sure:

# nvidia-settings -q GpuPowerMizerMode

  Attribute 'GPUPowerMizerMode' (blacktemple:1[gpu:0]): 1.
    Valid values for 'GPUPowerMizerMode' are: 0, 1 and 2.
    'GPUPowerMizerMode' can use the following target types: GPU.

For eight GPUs just write a simple script, something like:

for n in $(seq 0 7); do
    nvidia-settings -a "[gpu:$n]/GpuPowerMizerMode=1"
done

and run it at startup in whatever manner you find convenient.


I can't say whether your crashes are due to running with GpuPowerMizerMode!=1. If that is the case, then you probably have some sort of defective hardware that you should track down and replace.