Serious Performance Problems using Expose in Catalina
Solution 1:
It sounds very likely that you have an hardware error such as fans not working, thermal paste not properly applied, vents blocked or similar.
A load average of 3-4 when the machine is idle is definitely not normal. Your machine is a dual-core machine - HyperThreading doesn’t really make it sensible to count that as 4 cores (it’s not nearly as good as separate cores). A load avg. of 3-4 means the computer is highly loaded.
The high load average could come from thermal throttling (either via kernel_task forcing the CPU to sleep, or simply frequency throttling). It could also come from other sources such as a malfunctioning disk - although then you would most likely be experiencing other problems as well.
I will recommend booting in Internet Recovery mode to check if you still have high load average and slow performance there. If you do, then it’s not a software problem.
Solution 2:
Actions Taken
My solution involved the following:
- Cleaning the fans and airways
- Replacing the thermal paste
- Adding monitoring for temp and fan speed
- Increasing the overall fan speeds using custom rules over the system defaults
Cleaning the Hardware
I took my MacBookPro apart and found that there was a bit of dust accumulated in the fan itself and about 10% of the heat sink's fins were blocked with linty dust. There were also some dust-bunnies and random spots on the board and in pockets of the chassis. It didn't seem like a lot of dust, but I blasted it with canned air nonetheless.
Similarly, I was delighted that I still had a new tube of thermal paste in my computer toolkit. After unscrewing the heat-sink plate cinch from from the CPU core, I saw the old paste was all but dried up and very anemic. I used a dry cloth to wipe both surfaces clean and applied fresh pasted and cinched it back down. However, I may have used too much paste since there was some excess squeezed out the sides. The cinch was as tight as it would go without over-tightening and risking either stripping the heads, shearing the screws or ripping out the threading. The goal was to ensure there was no way for air bubbles trapped in the paste that could heat up and reduce the surface area where paste contacted both surfaces.
Software
Before I addressed the cooling, I added and configured three utilities:
Fan/Temp Readings and Controls: iStat Menus / Macs Fan Control
Both of these utilities have the ability to measure the fan speed and CPU temperature, both have a trial version, but iStat Menus reads about two dozen other temperature sensors throughout the system while MFC only reads the CPU core temp. While I first used MFS, I later found iSM to be the better overall choice as it has utility far beyond temp and fan management so I paid the $10 to use it as an overall metrics display.
MFC has limited fan control in trial mode while iStat Menus is fully functional where you can create custom fan speed rules so you can try it before you buy it to see if it is for you. Also, to be fair, iSM and MFC are not really comparable in scope since iSM allows you to make extensive customizable graphs for nearly every system metric imaginable.
Furthermore, iSM appears to measure stats which are already being reported much like /proc
is in Linux. To test, running with and without iSM I saw little variance in performance (using Activity Monitor, which I don't recommend running indefinitely as it uses a lot of resources relatively speaking). Even the memory footprint of the UI elements of iSM is only 35 MB - less than 1/4 of Activity Monitor which can also spike CPU load making it inappropriate as a full-time monitor like iSM is.
While I still have MFS installed, I don't really use it in favor of iSM. I keep it in case I need to have a potentially more lightweight fan controller/speedometer than iSM, but I have not had the need for it yet.
CPU Clock Measurement: Intel Power Gadget
In addition to iSM, Intel Power Gadget provides visibility of the variable CPU speed which actually is constantly fluctuating depending on system demand. I am not certain that this tool would register "clipping" as the result of overheating, but I can't imagine why it wouldn't. Like iSM it also provides extensive graphing features. Furthermore, it provides a new metric of CPU speed as a datapoint that iSM now puts in the CPU metrics and it's listed right next to the CPU Core temp for easy tracking! The tool need not even be running to read this- it is added to the rest of the system metrics that iStat can read! This was a great find and added a crucial metrics that I lacked before.
Results
In order to establish a speed benchmark, I used Geekbench 5 to get a baseline before and after the cooling cleanup and tuning. I also ran it in Safe Mode as well as normal mode (with as little running as possible, though this was not a pure test since things like photoanalysisd were often chugging along in the background regardless). Still, what I found was very surprising: while performance was drastically improved when heat was tamed, measurements after cleaning actually showed the CPU hitting hotter temps at spikes. I have a hypothesis for why, however.
Performance Improvements
Before I cleaned the cooling system and added fresh paste, bench tests showed single/multi CPU ratings at around 700/1775 at start both before and after the cleaning. Running the fans at max at all times vs cleaning the cooling didn't change performance measurably. Also, that measure is actually just above the Geekbench stated average for my machine. (Safe Mode tests actually were about 5-10% slower.)
While I waited for my machine to grow sluggish from either heat or the bloating of swap usage and grinding from page ins and outs, the fact that I had imposed more aggressive fan speed rules with iSM seemed to keep heat spikes in check all its own. Furthermore, after cleaning - even if I return fan speed rules to system defaults as before - I couldn't recreate the lockup problems even though the CPU temp was reporting at being 10O°C for sustained high load. It would seem that the CPU can get hotter that previous thought in order for the CPU to get clipped because I could see that my CPU speed was in "turbo" mode at zesty 3.4 GHz when running a GB CPU test. Even under extreme duress with fan settings that let the CPU reach temps over 100°C for more than 30 seconds, the overall performance improvements were profound and the machine just in anecdotal usability. The nasty problems with Expose/Mission Control did not recur.
Apparent High Temperature Spikes
Admittedly there is one anomaly that initially made no sense. Before I cleaned it and just ran the fan at max speed, CPU temps had a high floor even at rest 65°-75°C but the maximum measured temp never seemed to get hotter than 90°C. After cleaning, measurement behavior was very different. While the floor temperature at rest was lower with fan defaults when idle (sometimes as low as 40°C), I noticed that both the CPU core temperature would fluctuate wildly with load (and CPU speeds) where before the measurements showed a much more gradual change even though the measurement polling and refresh was the same. Also, aside from faster heating and cooling of sensors, the max readings occasionally peaked over 100° when it never reported as being that hot before. With the fan rules set to more aggressive settings that respond to higher temps with faster speeds, the CPU floor temp was about 15°-20°C lower with a load between 3-4, often down in the 40's when load was around 1. (More on system load and temp volatility in a bit.)
Conclusions
- It is clear that poor cooling was the primary factor in my machine's poor performance, though saying just that is a gross oversimplification of the results I saw. For one, I believe the measurements I took before the cleaning and pasting were not accurate nor precise because of dust buildup. It is possible that the wildly spiking temps that are commensurate with the changing CPU speeds that vary with the machines workload, the performance improvements don't track with this possibility. Since the temp sensors are what the machine uses to regulate CPU, it is possible that the disparity was causing some problems. Once the sensors could report accurately and precisely, all-around performance was closer to intended design.
- In addition to the physical cleaning, running with faster fan speeds did wonders for improving cooling just as one might expect before cleaning, however with a cleaned cooling channel the effects were that much more evident. While it appears to keep heat-related problems at bay running at full speed nearly constantly before cleaning, after cleaning I could relax the rules substantially and taming the heat was accomplished much more easily knowing that actual vs measured temps were not in disparity.
- Therefore, my running hypothesis for the temp volatility is that the CPU temp sensors' readings are outside the CPU code and had been muddied by dust accumulation. After being cleaned this may have allowed them to be more precise and accurate. It may be that higher temp spikes simply were not sustained long enough to to be read beforehand since the dust insulated the probes. Not knowing where the sensor probes that measure CPU temps physically are, I cannot say if this holds water or not.
- Finally, after cleaning and tuning the fan rules, system load - while still often spiking as high as 30 or 40, was not so terrible that it ground the machine to a halt. High load spikes are just a fact of life on older Macs, but I now know they don't have to be crippling. Before I cleaned and cooled, it looked like Expose/Mission control animation problems kicked in when load was over 100. Now, loads can get as high as 40 if I rally push it and it's paging like crazy, but even when the machine is at 3.4 GHz and fans are maxed out under large CPU and I/O loads and where it is paging at a rate of 8 MB/second, performance impacts leave my machine still useable if but a little sluggish on the UI.
TL;DR: Dusting the cooling system and upping the fan speeds as specific temp thresholds is what it took to get my machine back to its healthy self. Also, the i7 CPU has a variable clock rate that is a power saving feature, not just a throttle for when the heat spikes - at least not under normal operating conditions. Adding some good metrics that don't tax the system are crucial in seeing something other than an anecdotally notable improvement and iState Menus seems like a great choice if you're data hungry. There are more lightweight open source and command line tools out there for purists.
I hope these lengthy breakdown was helpful. I found it was worth going into all of the details, even if there was some redundancy.