My cpu slows down after a while and does not recover

Sometimes, and I can't reproduce it (but it happens often enough - a few times a week at least) my cpu slows down below its prescribed minimum. This is an example cpufreq-info output from a minute ago:

    cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
    Report errors and bugs to [email protected], please.
    analyzing CPU 0:
      driver: intel_pstate
      CPUs which run at the same hardware frequency: 0
      CPUs which need to have their frequency coordinated by software: 0
      maximum transition latency: 0.97 ms.
      hardware limits: 800 MHz - 3.30 GHz
      available cpufreq governors: performance, powersave
      current policy: frequency should be within 800 MHz and 3.30 GHz.
                      The governor "powersave" may decide which speed to use
                      within this range.
      current CPU frequency is 610 MHz.
    analyzing CPU 1:
      driver: intel_pstate
      CPUs which run at the same hardware frequency: 1
      CPUs which need to have their frequency coordinated by software: 1
      maximum transition latency: 0.97 ms.
      hardware limits: 800 MHz - 3.30 GHz
      available cpufreq governors: performance, powersave
      current policy: frequency should be within 800 MHz and 3.30 GHz.
                      The governor "powersave" may decide which speed to use
                      within this range.
      current CPU frequency is 615 MHz.
    analyzing CPU 2:
      driver: intel_pstate
      CPUs which run at the same hardware frequency: 2
      CPUs which need to have their frequency coordinated by software: 2
      maximum transition latency: 0.97 ms.
      hardware limits: 800 MHz - 3.30 GHz
      available cpufreq governors: performance, powersave
      current policy: frequency should be within 800 MHz and 3.30 GHz.
                      The governor "powersave" may decide which speed to use
                      within this range.
      current CPU frequency is 590 MHz.
    analyzing CPU 3:
      driver: intel_pstate
      CPUs which run at the same hardware frequency: 3
      CPUs which need to have their frequency coordinated by software: 3
      maximum transition latency: 0.97 ms.
      hardware limits: 800 MHz - 3.30 GHz
      available cpufreq governors: performance, powersave
      current policy: frequency should be within 800 MHz and 3.30 GHz.
                      The governor "powersave" may decide which speed to use
                      within this range.
      current CPU frequency is 589 MHz.

The problem is that it really slows everything down. Firefox becomes slower, vim's startup time grows from 150-250ms to above 700ms, g++ compilations become three times slower, etc.

Restart fixes everything.

Some error line from the past couple of hours from my syslog:

    May 17 16:10:53 lati kernel: [ 1421.872755] ACPI Error: Index value 0x0000000000000083 overflows field width 0x7 (20140424/exfldio-343)
    May 17 16:10:53 lati kernel: [ 1421.872758] ACPI Error: Method parse/execution failed [\NEVT] (Node ffff88040e047258), AE_AML_REGISTER_LIMIT (20140424/psparse-536)
    May 17 16:10:53 lati kernel: [ 1421.872761] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.ECDV._Q66] (Node ffff88040e044b90), AE_AML_REGISTER_LIMIT (20140424/psparse-536)
    May 17 16:10:56 lati kernel: [ 1425.907749] ACPI Error: Index value 0x0000000000000083 overflows field width 0x7 (20140424/exfldio-343)
    May 17 16:10:56 lati kernel: [ 1425.907765] ACPI Error: Method parse/execution failed [\NEVT] (Node ffff88040e047258), AE_AML_REGISTER_LIMIT (20140424/psparse-536)
    May 17 16:10:56 lati kernel: [ 1425.907794] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.ECDV._Q66] (Node ffff88040e044b90), AE_AML_REGISTER_LIMIT (20140424/psparse-536)
    May 17 16:12:09 lati kernel: [    1.925333] EXT4-fs (sda5): re-mounted. Opts: errors=remount-ro
    May 17 16:12:09 lati kernel: [    2.421037] systemd-udevd[331]: Error calling EVIOCSKEYCODE: Invalid argument
    May 17 16:12:21 lati gnome-session[2251]: WARNING: Could not parse desktop file tracker-store.desktop or it references a not found TryExec binary
    May 17 16:12:21 lati gnome-session[2251]: WARNING: Could not parse desktop file tracker-miner-fs.desktop or it references a not found TryExec binary
    May 17 16:12:51 lati gnome-session[2251]: GLib-CRITICAL: g_environ_setenv: assertion 'value != NULL' failed
    May 17 17:48:19 lati kernel: [ 5769.576717] systemd-hostnamed[6983]: Warning: nss-myhostname is not installed. Changing the local hostname might make it unresolveable. Please install nss-myhostname!

I am using Ubuntu 14.04.2, fresh install, 64bit, on Dell E7440, bios version A14.

By the way, even the execution of lsb_release, when I'm on this mode, is taking about 400ms.

Extra info

  • My processor model name: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz
  • My processor model number: 69
  • It probably occurs only after a suspend, but usually it doesn't happen (for example, it has happened only once since I've asked this question).

Extra info (2)

Output from grep -r . * in /sys/class/thermal:

    cooling_device0/type:Processor
    cooling_device0/power/control:auto
    cooling_device0/power/async:disabled
    cooling_device0/power/runtime_enabled:disabled
    cooling_device0/power/runtime_active_kids:0
    cooling_device0/power/runtime_active_time:0
    cooling_device0/power/runtime_status:unsupported
    cooling_device0/power/runtime_usage:0
    cooling_device0/power/runtime_suspended_time:0
    cooling_device0/cur_state:0
    cooling_device0/max_state:3
    cooling_device1/type:Processor
    cooling_device1/power/control:auto
    cooling_device1/power/async:disabled
    cooling_device1/power/runtime_enabled:disabled
    cooling_device1/power/runtime_active_kids:0
    cooling_device1/power/runtime_active_time:0
    cooling_device1/power/runtime_status:unsupported
    cooling_device1/power/runtime_usage:0
    cooling_device1/power/runtime_suspended_time:0
    cooling_device1/cur_state:0
    cooling_device1/max_state:3
    cooling_device2/type:Processor
    cooling_device2/power/control:auto
    cooling_device2/power/async:disabled
    cooling_device2/power/runtime_enabled:disabled
    cooling_device2/power/runtime_active_kids:0
    cooling_device2/power/runtime_active_time:0
    cooling_device2/power/runtime_status:unsupported
    cooling_device2/power/runtime_usage:0
    cooling_device2/power/runtime_suspended_time:0
    cooling_device2/cur_state:0
    cooling_device2/max_state:3
    cooling_device3/type:Processor
    cooling_device3/power/control:auto
    cooling_device3/power/async:disabled
    cooling_device3/power/runtime_enabled:disabled
    cooling_device3/power/runtime_active_kids:0
    cooling_device3/power/runtime_active_time:0
    cooling_device3/power/runtime_status:unsupported
    cooling_device3/power/runtime_usage:0
    cooling_device3/power/runtime_suspended_time:0
    cooling_device3/cur_state:0
    cooling_device3/max_state:3
    cooling_device4/type:intel_powerclamp
    cooling_device4/power/control:auto
    cooling_device4/power/async:disabled
    cooling_device4/power/runtime_enabled:disabled
    cooling_device4/power/runtime_active_kids:0
    cooling_device4/power/runtime_active_time:0
    cooling_device4/power/runtime_status:unsupported
    cooling_device4/power/runtime_usage:0
    cooling_device4/power/runtime_suspended_time:0
    cooling_device4/cur_state:-1
    cooling_device4/max_state:50
    thermal_zone0/mode:enabled
    thermal_zone0/temp:25000
    thermal_zone0/type:acpitz
    thermal_zone0/power/control:auto
    thermal_zone0/power/async:disabled
    thermal_zone0/power/runtime_enabled:disabled
    thermal_zone0/power/runtime_active_kids:0
    thermal_zone0/power/runtime_active_time:0
    thermal_zone0/power/runtime_status:unsupported
    thermal_zone0/power/runtime_usage:0
    thermal_zone0/power/runtime_suspended_time:0
    thermal_zone0/trip_point_0_temp:107000
    thermal_zone0/trip_point_0_type:critical
    thermal_zone0/policy:step_wise
    thermal_zone0/passive:0
    thermal_zone1/temp:47000
    thermal_zone1/type:x86_pkg_temp
    thermal_zone1/power/control:auto
    thermal_zone1/power/async:disabled
    thermal_zone1/power/runtime_enabled:disabled
    thermal_zone1/power/runtime_active_kids:0
    thermal_zone1/power/runtime_active_time:0
    thermal_zone1/power/runtime_status:unsupported
    thermal_zone1/power/runtime_usage:0
    thermal_zone1/power/runtime_suspended_time:0
    thermal_zone1/trip_point_0_temp:0
    thermal_zone1/trip_point_0_type:passive
    thermal_zone1/trip_point_1_temp:0
    thermal_zone1/trip_point_1_type:passive
    thermal_zone1/policy:step_wise

Solution 1:

Does the problem still occur?
I am eagerly looking for confirmation or denial of what I think is happening.

The theory is that somehow (a BIOS issue is suspected), after a suspend Clock Modulation has become enabled. The current version of the intel_pstate driver is incompatible with any use of Clock Modulation, always resulting in driving the target pstate to the minimum, regardless of load. The result is the apparent CPU frequency stuck at minimum * modulation percent. The acpi-cpufreq driver works fine with Clock Modulation, resulting in desired frequency * modulation percent. (i.e. the issue is less obvious with the acpi-cpufreq driver.)

Please do the following tests:
1.) (needed once per boot)

sudo modprobe msr

2.) before any suspend:

sudo rdmsr -a 0x19a

3.) after a suspend that results in the low CPU frequencies:

sudo rdmsr -a 0x19a

4.) If the result from step 3 is not 0, then:

sudo wrmsr -a 0x19a 0x0

and check it:

sudo rdmsr -a 0x19a

5.) Are the CPU frequencies O.K. now?

Post back here all the outputs.

Note: rdmsr and wrmsr are contained in the msr-tools package, which I do not recall if it is installed by default or not.

EDIT:

If you can, the intel subject matter expert on thermal interactions and pstates also wants the output from:

cd /sys/class/thermal
grep -r . *