Is the amount of NUMA nodes always equal to sockets?
Why are you wondering about number of NUMA nodes? The important part is NUMA topology, which says how are those "nodes" connected.
I have checked few systems including 8-socket (10-core CPUs) system consisting of 4 interconnected 2-socket blades (Hitachi Compute Node 2000). Also here the number of NUMA nodes is equal to number of CPU sockets (8). This depends on the CPU architecture, mainly its memory bus design.
The whole NUMA (non-uniform memory access) defines how can each logical CPU access each part of memory. When you have 2 socket system, each CPU (socket) has its own memory, which it can directly access. But it must also be able to access memory in the other socket - and this of course takes more CPU cycles than accessing local memory. NUMA nodes specifies which part of system memory is local to which CPU. You can have more layers of topology, for example in case of HP Superdome system (which uses Intel Itanium2 CPUs), you have local CPU socket memory, then memory on different socket inside the same cell and then memory in other cells (which have the highest latency).
You can configure the NUMA in your system to behave such as to give the best possible performance for your workload. You can for example allow all CPUs to access all memory, or to only access local memory, which then changes how the linux scheduler will distribute processes among the available logical CPUs. If you have many processes requiring not much memory, using only local memory can be benefit, but if you have large processes (Oracle database with its shared memory), using all memory among all cpus might be better.
You can use commands such as numastat
or numactl --hardware
to check NUMA status on your system. Here is info from that 8-socket machine:
hana2:~ # lscpu
Architecture: x86_64
CPU(s): 160
Thread(s) per core: 2
Core(s) per socket: 10
CPU socket(s): 8
NUMA node(s): 8
NUMA node0 CPU(s): 0-19
NUMA node1 CPU(s): 20-39
NUMA node2 CPU(s): 40-59
NUMA node3 CPU(s): 60-79
NUMA node4 CPU(s): 80-99
NUMA node5 CPU(s): 100-119
NUMA node6 CPU(s): 120-139
NUMA node7 CPU(s): 140-159
hana2:~ # numactl --hardware
available: 8 nodes (0-7)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
node 0 size: 130961 MB
node 0 free: 66647 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
node 1 size: 131072 MB
node 1 free: 38705 MB
node 2 cpus: 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
node 2 size: 131072 MB
node 2 free: 71668 MB
node 3 cpus: 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
node 3 size: 131072 MB
node 3 free: 47432 MB
node 4 cpus: 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
node 4 size: 131072 MB
node 4 free: 68458 MB
node 5 cpus: 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
node 5 size: 131072 MB
node 5 free: 62218 MB
node 6 cpus: 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139
node 6 size: 131072 MB
node 6 free: 68071 MB
node 7 cpus: 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
node 7 size: 131008 MB
node 7 free: 47306 MB
node distances:
node 0 1 2 3 4 5 6 7
0: 10 21 21 21 21 21 21 21
1: 21 10 21 21 21 21 21 21
2: 21 21 10 21 21 21 21 21
3: 21 21 21 10 21 21 21 21
4: 21 21 21 21 10 21 21 21
5: 21 21 21 21 21 10 21 21
6: 21 21 21 21 21 21 10 21
7: 21 21 21 21 21 21 21 10
There you can see the amount of memory present in each NUMA node (CPU socket) and how much of it is used and free.
The last section shows the NUMA topology - it shows the "distances" between individual nodes in terms of memory access latencies (the numbers are relative only, they don't represent time in ms or anything). Here you can see the latency to local memory (node 0 accessing memory in 0, node 1 in 1, ...) is 10 while remote latency (node accessing memory on other node) is 21. Although this system is consisting of 4 individual blades, the latency is the same for different socket on the same blade or other blade.
Interesting document about NUMA is also at RedHat portal.
No. The number of NUMA nodes does not always equal the number of sockets. For example, an AMD Threadripper 1950X has 1 socket and 2 NUMA nodes while a dual Intel Xeon E5310 system can show 2 sockets and 1 NUMA node.
Actually no. On my server:
➜ lscpu
Socket(s): 2
NUMA node(s): 4
NUMA node0 CPU(s): 0-31,128-159
NUMA node1 CPU(s): 32-63,160-191
NUMA node2 CPU(s): 64-95,192-223
NUMA node3 CPU(s): 96-127,224-255
NUMA node0 and node1 locate at socket 0 and the remaining two locate at socket 1 on my server.