CPU temperature spike in 90c+ only when plugged in
My Asus Vivobook K571GT dual booting in Ubuntu 20.04 is recently started shutting down due to high temperature (reaching 99c+). These temperature are reached only when the laptop is plugged in.
The BIOS is updated to the latest, Ubuntu updated to the latest kernel. I've seen it might be due to nvidia driver not installed properly, so I tried a bunch of different nvidia drivers (460, 470 & 495). Tried disabling nvdia altogether running only with the integrated GPU. They all had the same results, when plugged in the temperature spike from a respectable 40c-45c to 95c in a second (without that much CPU load, i.e. running the apt update
command will make the CPU temperature rise to 90c+), if I don't stop what I am doing or a command is running & I can't stop it in time the CPU will hit the 100c mark which trigger the shutdown. Interestingly if I unplugged while I get a high temperature warning the temperature goes back down to 45-50c in a second.
Has anyone experience something similar? The only thing I can think of for the rapid CPU temperature spike when plugged in but not on battery is the CPU getting "overclocked" when somehow. I'm not sure how I can verify this & if it somehow does how to prevent this from happening? An hardware issue like the AC adapter providing too much power?
Edit
grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver
/sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu10/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu11/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu1/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu2/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu3/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu4/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu5/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu6/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu7/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu8/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu9/cpufreq/scaling_driver:intel_pstate
grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu10/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu11/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu2/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu5/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu6/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu8/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu9/cpufreq/scaling_governor:powersave
grep "model name" /proc/cpuinfo
model name : Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
cat /sys/devices/system/cpu/intel_pstate/no_turbo
0
Edit
ps auxc | grep -i therm
root 167 0.0 0.0 0 0 ? I< 10:18 0:00 acpi_thermal_pm
root 1049 0.0 0.0 128808 9456 ? Ssl 10:18 0:00 thermald
sudo dmidecode -s bios-version
X571GT.311
ls -al /etc/thermald
total 28
drwxr-xr-x 2 root root 4096 Sep 8 13:48 .
drwxr-xr-x 148 root root 12288 Nov 2 12:01 ..
-rw-r--r-- 1 root root 4605 Jan 14 2019 thermal-conf.xml
-rw-r--r-- 1 root root 508 Jan 14 2019 thermal-cpu-cdev-order.xml
The laptop is just a year or two old. The latest BIOS update was release just a couple of weeks ago.
cat /etc/thermald/thermal-conf.xml
<?xml version="1.0"?>
<!--
use "man thermal-conf.xml" for details
-->
<!-- BEGIN -->
<ThermalConfiguration>
<Platform>
<Name>Generic X86 Laptop Device</Name>
<ProductName>EXAMPLE_SYSTEM</ProductName>
<Preference>QUIET</Preference>
<ThermalSensors>
<ThermalSensor>
<Type>TSKN</Type>
<AsyncCapable>1</AsyncCapable>
</ThermalSensor>
</ThermalSensors>
<ThermalZones>
<ThermalZone>
<Type>SKIN</Type>
<TripPoints>
<TripPoint>
<SensorType>TSKN</SensorType>
<Temperature>55000</Temperature>
<type>passive</type>
<ControlType>SEQUENTIAL</ControlType>
<CoolingDevice>
<index>1</index>
<type>rapl_controller</type>
<influence> 100 </influence>
<SamplingPeriod> 16 </SamplingPeriod>
</CoolingDevice>
<CoolingDevice>
<index>2</index>
<type>intel_powerclamp</type>
<influence> 100 </influence>
<SamplingPeriod> 12 </SamplingPeriod>
</CoolingDevice>
</TripPoint>
</TripPoints>
</ThermalZone>
</ThermalZones>
</Platform>
<!-- Thermal configuration example only -->
<Platform>
<Name>Example Platform Name</Name>
<!--UUID is optional, if present this will be matched -->
<!-- Both product name and UUID can contain
wild card "*", which matches any platform
-->
<UUID>Example UUID</UUID>
<ProductName>Example Product Name</ProductName>
<Preference>QUIET</Preference>
<ThermalSensors>
<ThermalSensor>
<!-- New Sensor with a type and path -->
<Type>example_sensor_1</Type>
<Path>/some_path</Path>
<AsyncCapable>0</AsyncCapable>
</ThermalSensor>
<ThermalSensor>
<!-- Already present in thermal sysfs,
enable this or add/change config
For example, here we are indicating that
sensor can do async events to avoid polling
-->
<Type>example_thermal_sysfs_sensor</Type>
<!-- If async capable, then we don't need to poll -->
<AsyncCapable>1</AsyncCapable>
</ThermalSensor>
<ThermalSensor>
<!-- Examle of a virtual sensor. This sensor
depends on other real sensor or
virtual sensor.
E.g. here the temp will be
temp of example_sensor_1 * 0.5 + 10
-->
<Type>example_virtual_sensor</Type>
<Virtual>1</Virtual>
<SensorLink>
<SensorType>example_sensor_1</SensorType>
<Multiplier> 0.5 </Multiplier>
<Offset> 10 </Offset>
</SensorLink>
</ThermalSensor>
</ThermalSensors>
<ThermalZones>
<ThermalZone>
<Type>Example Zone type</Type>
<TripPoints>
<TripPoint>
<SensorType>example_sensor_1</SensorType>
<!-- Temperature at which to take action -->
<Temperature> 75000 </Temperature>
<!-- max/passive/active
If a MAX type is specified, then
daemon will use PID control
to aggresively throttle to avoid
reaching this temp.
-->
<type>max</type>
<!-- SEQUENTIAL | PARALLEL
When a trip point temp is violated, then
number of cooling device can be activated.
If control type is SEQUENTIAL then
It will exhaust first cooling device before trying
next.
-->
<ControlType>SEQUENTIAL</ControlType>
<CoolingDevice>
<index>1</index>
<type>example_cooling_device</type>
<!-- Influence will be used order cooling devices.
First cooling device will be used, which has
highest influence.
-->
<influence> 100 </influence>
<!-- Delay in using this cdev, this takes some time
too actually cool a zone
-->
<SamplingPeriod> 12 </SamplingPeriod>
</CoolingDevice>
</TripPoint>
</TripPoints>
</ThermalZone>
</ThermalZones>
<CoolingDevices>
<CoolingDevice>
<!--
Cooling device can be specified
by a type and optionally a sysfs path
If the type already present in thermal sysfs
no need of a path.
Compensation can use min/max and step size
to increasing cool the system.
Debounce period can be used to force
a waiting period for action
-->
<Type>example_cooling_device</Type>
<MinState>0</MinState>
<IncDecStep>10</IncDecStep>
<ReadBack> 0 </ReadBack>
<MaxState>50</MaxState>
<DebouncePeriod>5000</DebouncePeriod>
<!--
If there are no PID parameter
compensation increase step wise and exponentaially
if single step is not able to change trend.
Alternatively a PID parameters can be specified
then next step will use PID calculation using
provided PID constants.
-->>
<PidControl>
<kp>0.001</kp>
<kd>0.0001</kd>
<ki>0.0001</ki>
</PidControl>
</CoolingDevice>
</CoolingDevices>
</Platform>
</ThermalConfiguration>
<!-- END -->
top
top - 13:16:27 up 1:37, 1 user, load average: 0.85, 1.32, 1.11
Tasks: 487 total, 2 running, 484 sleeping, 1 stopped, 0 zombie
%Cpu(s): 5.1 us, 2.0 sy, 1.5 ni, 90.6 id, 0.1 wa, 0.0 hi, 0.7 si, 0.0 st
GiB Mem : 15.5 total, 4.5 free, 5.0 used, 5.9 buff/cache
GiB Swap: 2.0 total, 2.0 free, 0.0 used. 10.1 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
35883 root 39 19 84636 68132 12616 R 19.8 0.4 0:00.60 apt-check
4842 haleks 20 0 4487900 483220 120988 S 2.6 3.0 1:49.49 gnome-shell
7291 haleks 20 0 923372 60172 45804 S 2.3 0.4 1:34.25 psensor
32705 haleks 20 0 24.5g 130676 77652 S 2.3 0.8 0:14.20 brave
975 message+ 20 0 40380 34872 4068 S 1.0 0.2 0:31.14 dbus-daemon
1002 root 20 0 2332860 32620 16456 S 1.0 0.2 0:05.98 snapd
4555 haleks 20 0 24.7g 147872 79744 S 1.0 0.9 1:10.25 Xorg
5229 haleks 20 0 2258744 131912 45796 S 1.0 0.8 1:16.97 keybase
35782 root 20 0 287276 16044 14104 S 1.0 0.1 0:00.03 packagekitd
663 root -51 0 0 0 0 S 0.7 0.0 0:38.09 irq/152-nvidia
21473 haleks 20 0 819496 53768 39012 S 0.7 0.3 0:07.86 gnome-terminal-
32564 haleks 20 0 16.6g 410380 190120 S 0.7 2.5 0:42.65 brave
32596 haleks 20 0 16.6g 182632 87372 S 0.7 1.1 0:47.20 brave
34076 root 20 0 25368 13280 7900 S 0.7 0.1 0:00.16 apt
357 root 19 -1 68944 30764 29000 S 0.3 0.2 0:01.12 systemd-journal
387 root 20 0 24164 7796 4236 S 0.3 0.0 0:02.20 systemd-udevd
517 root -51 0 0 0 0 S 0.3 0.0 0:00.73 irq/148-iwlwifi
992 root 20 0 235188 10276 6928 S 0.3 0.1 0:02.17 polkitd
1065 root 20 0 716580 12360 9072 S 0.3 0.1 0:01.60 canonical-livep
1349 gdm 20 0 317300 9004 7968 S 0.3 0.1 0:00.28 goa-identity-se
1864 root 20 0 2432052 150584 31964 S 0.3 0.9 0:07.40 lxd
4545 haleks 20 0 8748 5860 4012 S 0.3 0.0 0:01.37 dbus-daemon
5448 haleks 20 0 2370936 172572 33964 S 0.3 1.1 0:27.26 kbfsfuse
7473 haleks 20 0 503408 143448 66476 S 0.3 0.9 0:35.84 Keybase
7575 haleks 20 0 463344 40076 32528 S 0.3 0.2 0:00.39 update-notifier
10111 haleks 20 0 582224 166968 80480 S 0.3 1.0 0:37.21 gitkraken
32662 haleks 20 0 24.4g 121680 81520 S 0.3 0.7 0:03.68 brave
35783 root 20 0 24164 5228 1652 S 0.3 0.0 0:00.01 systemd-udevd
35784 root 20 0 24164 5228 1652 S 0.3 0.0 0:00.01 systemd-udevd
35786 root 20 0 24164 5228 1652 S 0.3 0.0 0:00.01 systemd-udevd
1 root 20 0 168176 12092 8296 S 0.0 0.1 0:08.88 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H-kblockd
9 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq
10 root 20 0 0 0 0 S 0.0 0.0 0:00.11 ksoftirqd/0
11 root 20 0 0 0 0 I 0.0 0.0 0:09.66 rcu_sched
12 root rt 0 0 0 0 S 0.0 0.0 0:00.02 migration/0
13 root -51 0 0 0 0 S 0.0 0.0 0:00.00 idle_inject/0
14 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/0
15 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/1
16 root -51 0 0 0 0 S 0.0 0.0 0:00.00 idle_inject/1
17 root rt 0 0 0 0 S 0.0 0.0 0:00.18 migration/1
18 root 20 0 0 0 0 S 0.0 0.0 0:00.06 ksoftirqd/1
Solution 1:
Your /etc/thermald/thermal-conf.xml is incorrect. It's two example files tacked together.
Try this somewhat generic .xml file shown below.
Note: You may end up customizing the following line...
<Temperature>60000</Temperature>
Then restart thermald
with:
sudo systemctl restart thermald
<?xml version="1.0"?>
<ThermalConfiguration>
<Platform>
<Name>Override CPU default passive</Name>
<ProductName>*</ProductName>
<Preference>QUIET</Preference>
<ThermalZones>
<ThermalZone>
<Type>cpu</Type>
<TripPoints>
<TripPoint>
<Temperature>60000</Temperature>
<type>passive</type>
</TripPoint>
</TripPoints>
</ThermalZone>
</ThermalZones>
</Platform>
</ThermalConfiguration>