Ubuntu 19 kidle_inject process
i have a friend of mine which have a brand new ASUS laptop. I tried to use Ubuntu 18 which works fine on my ASUS laptop, but on his laptop, the screen was freezing maybe because of not supported graphics drivers (i suppose but not sure of that).
I read on ASUS forums that installing Ubuntu 19 can make things better. This is the case. We installed it and its better now except one thing.
We have 4 kidle_inject processes that uses each 50% of CPU which make computer really slow. How can i disable or lower the impact of kidle_inject processes ?
Thanks
The kidle_inj processes, one per CPU, are one way to achieve thermal throttling to help keep your processor cooler. Typically, invoked by thermald, the actual method used is a function of the CPU frequency scaling driver, with possible overides via the /etc/thermald/thermal-cpu-cdev-order.xml
file.
Let's work through a couple of examples. Even under 100% utilization on all CPUs, my test server doesn't overheat, so I will set a low thermal trip point of 55 degrees for this.
First using the intel_pstate CPU frequency scaling driver and the powersave governor:
doug@s15:~$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
doug@s15:~$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
Now use turbostat to monitor things, and also watch the maximum performance setting (maximum allowed CPU frequency as a percent). It starts un-throttled:
doug@s15:~$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
100
.
doug@s15:~$ sudo turbostat --quiet --Summary --show Busy%,Bzy_MHz,PkgTmp,PkgWatt --interval 5
Busy% Bzy_MHz PkgTmp PkgWatt
0.02 1600 26 3.70
0.03 1600 26 3.70
2.21 3737 38 6.87
38.89 3564 48 42.70
94.64 3500 50 58.54 <<< Load being ramped up.
100.00 3500 52 58.49 <<< Processor package temperature going up.
100.00 3500 53 58.78
100.00 3500 56 59.04
100.00 3500 56 59.27
100.00 3123 53 51.18 <<< Notice throttling via clock frequency
100.00 2969 56 47.32
100.00 2693 52 41.90
100.00 2009 53 28.98
100.00 2489 55 37.90
100.00 2431 54 36.82
100.00 2620 54 40.50
100.00 2409 55 36.39
100.00 2511 54 38.47
100.00 2569 57 39.61
100.00 2301 53 34.57
100.00 1682 53 23.64
100.00 2089 54 30.52
100.00 2569 56 39.59
100.00 2301 52 34.55
87.08 1671 53 22.98
48.70 2037 52 24.04
5.58 2318 44 7.50
0.02 1603 40 3.88
0.03 1600 40 3.87
0.02 1600 39 3.85
^C0.04 1600 38 3.86
And during the above the max percent was reduced, until after the load was removed, and the processor temperature dropped:
doug@s15:~$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
60
doug@s15:~$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
60
doug@s15:~$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
100
Second, use the acpi-cpufreq CPU scaling driver and ondemand governor:
doug@s15:~$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver
acpi-cpufreq
acpi-cpufreq
acpi-cpufreq
acpi-cpufreq
acpi-cpufreq
acpi-cpufreq
acpi-cpufreq
acpi-cpufreq
doug@s15:~$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
Now use turbostat to monitor things, and also watch the kidle_inject threads. Notice that the turbostat output has an added column , %C6 which is the deepest idle state my processor goes into (done via "hide" instead of via "show" because "show" method doesn't work).
doug@s15:~$ sudo turbostat --Summary --quiet --hide IRQ,Avg_MHz,SMI,GFXMHz,TSC_MHz,GFXWatt,CorWatt,POLL%,CPU%c1,CPU%c3,CPU%c7,CoreTmp,GFX%rc6,Pkg%pc2,Pkg%pc3,Pkg%pc6,POLL,C1,C1E,C3,C6,C1%,C1E%,C3%,C6% --interval 5
Busy% Bzy_MHz CPU%c6 PkgTmp PkgWatt
0.05 1602 99.83 26 3.71
0.04 1600 99.87 28 3.71
0.05 1600 99.83 26 3.71
0.05 1601 99.84 26 3.71
24.67 3591 52.79 45 30.24
93.87 3500 0.00 47 58.30 <<< Load ramped up
100.00 3500 0.00 50 58.42
100.00 3500 0.00 53 58.70
100.00 3500 0.00 55 58.99
100.00 3500 0.00 56 59.23
93.72 3424 6.18 56 54.44 <<< Now some C6 idle time is forced via kidle_inj
81.41 3223 18.32 54 44.49
77.81 3179 21.82 56 42.02
83.82 3348 15.90 57 48.14
78.87 3278 20.78 54 44.52
66.34 3061 33.15 54 35.02
62.65 2898 36.78 54 30.80
61.20 2856 38.20 53 29.63
63.71 3051 35.73 54 33.36
61.67 2938 37.76 54 30.90
61.92 2929 37.53 52 30.95
63.47 3039 35.97 55 33.17
60.87 2862 38.52 56 29.60
62.90 3073 36.56 53 33.40
62.36 2964 37.09 55 31.61
61.16 2866 38.24 53 29.78
63.98 3099 35.43 55 34.28
56.37 2708 43.01 52 25.80
52.01 2616 47.29 53 23.07
58.24 2738 41.15 53 26.86
65.39 3143 34.05 56 35.60
68.01 3209 31.50 56 38.09
58.62 2949 40.79 53 29.83
58.43 2730 40.95 53 26.88
48.87 3158 36.84 53 33.68
14.74 2642 70.22 43 14.77
0.37 1602 99.10 42 4.02
0.29 1601 99.30 40 3.97
0.23 1602 99.43 40 3.94
0.17 1601 99.58 39 3.91
0.17 1686 99.56 38 3.91
0.06 1601 99.79 38 3.85
0.04 1602 99.87 36 3.83
0.04 1600 99.88 36 3.83
0.09 1750 99.75 35 3.85
0.04 1600 99.89 35 3.82
0.04 1600 99.85 36 3.82
0.04 1600 99.88 34 3.81
0.04 1600 99.86 35 3.80
^C0.04 1600 99.87 33 3.80
And during throttling, the kidle_inj tasks force the deep idle state and the processor power goes down.
doug@s15:~$ ps aux | grep kidle
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 3005 33.2 0.0 0 0 ? S 00:19 0:32 [kidle_inj/0]
root 3006 33.2 0.0 0 0 ? S 00:19 0:32 [kidle_inj/1]
root 3007 33.2 0.0 0 0 ? S 00:19 0:32 [kidle_inj/2]
root 3008 33.2 0.0 0 0 ? S 00:19 0:32 [kidle_inj/3]
root 3009 33.2 0.0 0 0 ? S 00:19 0:32 [kidle_inj/4]
root 3010 33.2 0.0 0 0 ? S 00:19 0:32 [kidle_inj/5]
root 3011 33.2 0.0 0 0 ? S 00:19 0:32 [kidle_inj/6]
root 3012 33.2 0.0 0 0 ? S 00:19 0:32 [kidle_inj/7]
doug@s15:~$ ps aux | grep kidle
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 3005 33.5 0.0 0 0 ? S 00:19 0:35 [kidle_inj/0]
root 3006 33.5 0.0 0 0 ? S 00:19 0:35 [kidle_inj/1]
root 3007 33.5 0.0 0 0 ? S 00:19 0:35 [kidle_inj/2]
root 3008 33.5 0.0 0 0 ? S 00:19 0:35 [kidle_inj/3]
root 3009 33.5 0.0 0 0 ? S 00:19 0:35 [kidle_inj/4]
root 3010 33.5 0.0 0 0 ? S 00:19 0:35 [kidle_inj/5]
root 3011 33.5 0.0 0 0 ? S 00:19 0:35 [kidle_inj/6]
root 3012 33.5 0.0 0 0 ? S 00:19 0:35 [kidle_inj/7]
doug@s15:~$ ps aux | grep kidle
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 3005 29.1 0.0 0 0 ? S 00:19 0:44 [kidle_inj/0]
root 3006 29.1 0.0 0 0 ? S 00:19 0:44 [kidle_inj/1]
root 3007 29.1 0.0 0 0 ? S 00:19 0:44 [kidle_inj/2]
root 3008 29.1 0.0 0 0 ? S 00:19 0:44 [kidle_inj/3]
root 3009 29.1 0.0 0 0 ? S 00:19 0:44 [kidle_inj/4]
root 3010 29.1 0.0 0 0 ? S 00:19 0:44 [kidle_inj/5]
root 3011 29.1 0.0 0 0 ? S 00:19 0:44 [kidle_inj/6]
root 3012 29.1 0.0 0 0 ? S 00:19 0:44 [kidle_inj/7]
doug@s15:~$ ps aux | grep kidle
... throttling over, processes gone ...
You should not disable whatever thermal protection method you are using, but you could clean your fans and such to help keep the processor cooler. Also, if you have the option to use the pstate method of throttling, the remaining performance is typically higher than the kidle_inj method. For example, and for the workflow used for the above, the pstate method outperforms the intel_powerclamp kidle_inj method by 33%.
Now, if for whatever reason your processor is capable of using the intel_pstate CPU frequency scaling driver, but you have chosen not to, then the suggestion is to use the intel_cpufreq driver (which is just the intel_pstate driver in passive mode) and the ondemand governor. Why? Because then the pstate throttling method will be used. On my system, resulting in about 28% performance improvement over the kidle_inject method under the same throttling conditions.
How to change from intel_pstate to intel_cpufreq?
doug@s15:~$ cat /sys/devices/system/cpu/intel_pstate/status
active
doug@s15:~$ echo passive | sudo tee /sys/devices/system/cpu/intel_pstate/status
passive
And set the governor:
doug@s15:~$ sudo su
root@s15:/home/doug# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "ondemand" > $file; done
root@s15:/home/doug# exit
exit
doug@s15:~$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver
intel_cpufreq
intel_cpufreq
intel_cpufreq
intel_cpufreq
intel_cpufreq
intel_cpufreq
intel_cpufreq
intel_cpufreq
doug@s15:~$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
ondemand
Why is there such a significant difference in performance? Because the kidle_inj method wastes a lot of time and energy going into and exiting the deep idle state, whereas the pstate method does not.
And for users seeing "idle_inject" instead of, or in addition to, "kidle_inj":
doug@s15:~$ ps aux | grep idle
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 12 0.0 0.0 0 0 ? S 09:07 0:00 [idle_inject/0]
root 16 0.0 0.0 0 0 ? S 09:07 0:00 [idle_inject/1]
root 22 0.0 0.0 0 0 ? S 09:07 0:00 [idle_inject/2]
root 28 0.0 0.0 0 0 ? S 09:07 0:00 [idle_inject/3]
root 34 0.0 0.0 0 0 ? S 09:07 0:00 [idle_inject/4]
root 40 0.0 0.0 0 0 ? S 09:07 0:00 [idle_inject/5]
root 46 0.0 0.0 0 0 ? S 09:07 0:00 [idle_inject/6]
root 52 0.0 0.0 0 0 ? S 09:07 0:00 [idle_inject/7]
That is relatively recent, as of kernel 4.19, and the kernel configuration parameter is "CONFIG_IDLE_INJECT", which is set for Ubuntu kernels, but I don't yet know the purpose.
EDIT (2019.08.09, Aug 9th):
Readers, please be aware that for the intel_cpufreq CPU frequency scaling driver (intel_pstate in passive mode), and the acpi-cpufreq driver and schedutil governor some bugs may cause thermald to not work correctly. A reduction in the maximum allowed CPU frequency might not be honored by the system. Patches to fix these issues are in progress, but will take awhile to propagate.