How do I disable a specific CPU core at boot?
I have a 16 core Intel Xeon processor that successfully boots into the BIOS and GRUB, but fails to load Ubuntu (or any operating system). It turns out that core #14 is the cause of all the issues (discovered after testing each individual core with memtest86). In the BIOS, I can set the system to run with just 2 cores, and the system works in this configuration. But I would like to be able to use 15 out of 16 of the cores. Is there a way to disable only core #14 at boot?
Solution 1:
You can make use of CPU hotplug abilities to achieve your objective. You can boot CPUs 0-13 and then add the others (CPUs 15-27, and 29-31) afterwards.
All Xeon processors have hyper threading, so I assume you mean 16 cores at 2 threads per core, for a total of 32 CPUs. This answer is written, and tested, for a 4 core, 2 threads per core, processor, where core 2 is the bad one.
First, as sudo, edit /etc/default/grub
and add the maximum boot time CPUs, maxcpus=
, to your GRUB_CMDLINE_LINUX_DEFAULT
line. Example for my system:
Was:
GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1 consoleblank=300 cpuidle_sysfs_switch cpuidle.governor=teo"
Now:
GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1 consoleblank=300 cpuidle_sysfs_switch cpuidle.governor=teo maxcpus=2"
Where I used maxcpus=2
you would use maxcpus=14
.
Save a copy of grub first, and run sudo update-grub
after.
Thus, the system will boot only using cores 0 and 1, and in CPUs 0,1 being on-line:
doug@s15:~$ grep . /sys/devices/system/cpu/cpu*/online
/sys/devices/system/cpu/cpu1/online:1
/sys/devices/system/cpu/cpu2/online:0
/sys/devices/system/cpu/cpu3/online:0
/sys/devices/system/cpu/cpu4/online:0
/sys/devices/system/cpu/cpu5/online:0
/sys/devices/system/cpu/cpu6/online:0
/sys/devices/system/cpu/cpu7/online:0
Note: For the default Ubuntu kernel configurations CPU 0 is always online, and there is no such thing as:
doug@s15:~$ grep . /sys/devices/system/cpu/cpu0/online
grep: /sys/devices/system/cpu/cpu0/online: No such file or directory
O.K. so now, bring the other desired cores and CPUs on-line:
doug@s15:~$ echo 1 | sudo tee /sys/devices/system/cpu/cpu3/online
1
doug@s15:~$ echo 1 | sudo tee /sys/devices/system/cpu/cpu4/online
1
doug@s15:~$ echo 1 | sudo tee /sys/devices/system/cpu/cpu5/online
1
doug@s15:~$ echo 1 | sudo tee /sys/devices/system/cpu/cpu7/online
And check:
doug@s15:~$ grep . /sys/devices/system/cpu/cpu*/online
/sys/devices/system/cpu/cpu1/online:1
/sys/devices/system/cpu/cpu2/online:0
/sys/devices/system/cpu/cpu3/online:1
/sys/devices/system/cpu/cpu4/online:1
/sys/devices/system/cpu/cpu5/online:1
/sys/devices/system/cpu/cpu6/online:0
/sys/devices/system/cpu/cpu7/online:1
So now, I have cores 0,1,3 on-line and core 2 offline and 6 CPUs available. Note that core 0 = cpus 0 and 4, core 1 = cpus 1 and 5, ...
EDIT 1: For 32 CPUs, perhaps you have multiple nodes (processors), so the core to CPU mapping might be different.
EDIT 2: It may be that the CPUs that are brought on-line after boot default to using the performance governor in the intel_pstate CPU frequency scaling driver, which is the kernel configuration default (which gets changed to powersave 1 minute after boot, for the boot enabled CPUs). You might want to check and set all CPU governors to your preference, typically the powersave governor. To check do:
grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver
grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
To change governors do, for example (notice as root):
# for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "powersave" > $file; done
Once you have things working the way you want you can automate the additional on-line after boot step (see other questions and answers for "how to").
Note: It seems to me that you should be able to achieve your objective in one boot step via "cpu_possible_mask" manipulation via "possible_cpus=n", but I couldn't get it to work. Someone else might know.