rcu_sched self-detected stall on CPU + watchdog: BUG: soft lockup - CPU#3 stuck for 22s

Being unable to ssh into a machine I connected it to a monitor and found the following:

enter image description here

The machine is running Ubuntu Server 18.04 LTS and is a first generation 8 core Ryzen 1700. I've restarted the machine since and it works fine but am not sure what caused this in the first place and want to avoid it happening again.

enter image description here

enter image description here


Solution 1:

This is a random issue with 1st and 2nd gen Ryzens (at least). You'll find several reports and no real solution. I have a Ryzen 2700U and in forums people always suggest to try a newer kernel. I've tried kernel 5.0, 5.4, 5.6 and also gave 5.8 a run. Had the issue with all of them.

I've recently increased the kernel.watchdog_thresh from 10s (default) to 60s (max).

sudo sysctl -w kernel.watchdog_thresh=60

to make it permanent add the following to /etc/sysctl.conf:

kernel.watchdog_thresh=60

It's still too early to say if it worked but I have a good feeling.