On a dual CPU server, is it normal for one CPU to run hotter than the other?

Solution 1:

The problem ended up being a poorly fit heatsink. Maybe poorly fit isn't the right description. Turns out, you have to put thermal paste on the heatsink, not the plastic cover that goes over the heatsink.

enter image description here

After removing the plastic cover, the CPU is nice and cool, thanks everyone!

Solution 2:

In my experience, it is normal for paired components in a case to run at different temperatures, because airflow is not the same everywhere. Here's a graph of HDD temperature from my colo box. The drives are mirrored, so the workloads on them are near to identical.

munin graph of HDD temps over past year

As you can see, they track each other, but they're not the same; they're also, on average, only 6C apart. Whether your sensors report absolute temperature or overtemperature, a difference of 55C under load seems very badly wrong. If you have confidence the data are right, then given the quiescent difference drops to 10C, which is the sort of difference I see due to airflow, I'd suspect a poorly-fitted heatsink.

Solution 3:

It is not. Unless you have some serious issues with the airflow. Or one of the coolers is bad. Temperature WILL vary - but not that much (70 vs. 15 degree celsius).

Given how low 15 degree is I would assume (a) your sensor is off (you really store the server in a that cool room?).

I would also assume one of the CPU does simply no work at all, for whatever reason.

Small differences are normal. Some little larger ones may be (airflow coming to my mind). but here we talk about one being COLD.

Solution 4:

This could be either cooling or uneven loading (given the temp difference your situation is probably uneven loading). You should use something like prime95 to load all the cores evenly and see if the temps still vary. If they don't then you need to balance the VMs, check that your apps are multithreaded and busy. How to do that depends on your software and individual workload so is beyond the scope of the question really. Bear in mind there is no real advantage to doing this if you don't have enough load to top out a single cpu/core, in fact your VM may deliberately avoid using a second cpu so that it can go into power saving modes on multi-cpu systems.

If you have narrowed it down to cooling. A small difference of upto 10C could be too little (or too much!) thermal paste. A bigger difference indicates a significant problem or difference between cpu coolers. It could be that one has blocked airflow, a heatsink has been knocked loose, etc.