Server Motherboards Memory Per CPU
I noticed that on the new dual socket 1366 server type motherboards there are two banks of RAM. Does this mean that if I have 72GB of RAM installed that Windows will only allow 36GB per processor or will one processor have access to all 72GB?
Solution 1:
A Dual socket board will be configured with two CPU systems that includes memory slots associated to each socket. If there are two memory banks each will be wired to a CPU slot. The memory bank will not be directly available for the other slot.
That implies a motherboard with 72GB capacity has 36GB per CPU SLOT capacity.
However, if your DIMMs are asymmetrically setup like in this Intel board,
I suspect you will have 24GB on one CPU and 48GB on the other... need to confirm that.
If you are referring to a Nehalem based 1366 board, you will get a setup of 3 memory slots per CPU slot. You will populate 3xDDR3 DIMMs to get your per-cpu memory.
Nehalem architecture does better access of memory from the other slot bank using Non Uniform Memory Architecture (NUMA).
NUMA attempts to address this problem by providing separate memory for each processor, avoiding the performance hit when several processors attempt to address the same memory. For problems involving spread data (common for servers and similar applications), NUMA can improve the performance over a single shared memory by a factor of roughly the number of processors (or separate memory banks).
Of course, not all data ends up confined to a single task, which means that more than one processor may require the same data. To handle these cases, NUMA systems include additional hardware or software to move data between banks. This operation has the effect of slowing down the processors attached to those banks, so the overall speed increase due to NUMA will depend heavily on the exact nature of the tasks run on the system at any given time.
When you are not using Nehalem NUMA, the older scheme works differently, a quick difference is visually shown in this ArsTechnica article page. Basically, you have the worst case access time for everything (multi-socket memory access with the full cost of multiway access).
The NUMA technique allows better access times across banks. The final result is better memory throughput, particularly when each processor slot has its data localized in its bank.
I am not yet confident about all points of this answer and invite other opinions.
Solution 2:
From the logical operating system view, full ram is accessible to each core.
From a performance standpoint, there are differences depending on the memory location and the physical layout of the chips. Memory accesses will be routed through the necessary pathing, probably costing performance, depending on the location.
Looking at Nehalem type Boards, packs of 3 RAM slots (ore multiples thereof) are attached to individual CPU chips. The Quickpath interconnect allows other CPUs to access that memory.
So there will be some numactl trickery involved to get the optimum performance. For instance, shared memory for some task may be at one place in memory, where the different threads have different access speeds.
The number of RAM slots has little to do with this.
Intel has great documentation, if you want to get into it, see http://www.intel.com/technology/quickpath/index.htm and so on
Solution 3:
FOr a definitive answer you should be consulting the motherboard documentation, or the manufacturer if the documentation doesn't make it clear. Knowing how it works for other motherboards is of no value whatsoever.