How to check if server is NUMA enabled?
My boss wants to know if the HP DL320 G5 is NUMA enabled. I've tried scouring the internet, but can't find any specific information.
Does anyone know off hand if this server is suitable for running NUMA applications?
Solution 1:
Opterons and Nelahem (55xx and later) Xeons have a NUMA architecture - each socket has its own memory bus and there is a link between the sockets. This link is called Hypertransport on Opteron systems and Quickpath on Xeons. The G5 predates Nelahem and still uses the older Front Side Bus, which is not a NUMA architecture.
Opterons and 35xx/55xx or later Xeons can use a pure NUMA addressing mode, where each socket's memory lives in a contiguous section of the physical address space. If you want to run an application that is NUMA aware (e.g. support for processor affinity) then you can set up the system to run in this mode.
Systems of this type also have a legacy mode where individual 4K pages alternate across sockets, so the memory access is finely mixed between the sockets. This has a slight performance overhead as half of all memory accesses have to go across the Hypertransport bus to the other socket (Quickpath in the case of Xeons). However most accesses will be cached so the perfomance overhead is relatively small.
This mode allows the systems to run non NUMA-aware applications efficiently, and is typically the default mode that the system boots up in. Normally you can configure this in the BIOS.
Your G5 will not run in a NUMA mode because it has a front-side bus architecture. FSB is a single bus shared by the memory and all the processor sockets, so it has uniform memory access characteristics, i.e. not NUMA. I'm not aware of any wintel or lintel applications that depend on a NUMA architecture; chances are that the application does not need NUMA but will support it if present. You can probably still run the application on your older G5 system. Whether this is relevent depends on the application and what you want to achieve.
NUMA support in applications
Some applications (SQL Server is an example) can realise significant performance benefits by optimising memory, I/O utilisation and scheduling so as to minimise penalties for non-local access. Implementing NUMA support in an application requires supporting facilities to be available from the operating system, such as:
Scheduler affinity: A thread can be placed in a pool that has preference to schedule on one or a group of processors. Note that NUMA can have more than one processor on a single memory bus - in the case of a multi-core Opteron or Xeon the cores on a single die all share the same bus. This allows the thread to request local memory or use pools of memory local to the CPU pool. Also, when a thread is kept on a local CPU it minimises cache thrashing as as the thread is scheduled - the working set is just the working set of threads using that particular core.
Memory affinity: A thread can request memory and specify that it must or should be available from memory local to a socket. Keeping memory and CPU usage on the same bus minimises the overhead of non-local memory access. The overhead is not so great on modern NUMA systems but non-local access was much slower on older systems such as early Sequent gear.
I/O affinity: Peripheral buses can be tied to a local CPU, so I/O handling can be scheduled on processors that are close to the I/O. Most NUMA systems have multiple I/O busses, so scheduling interrupt handlers and DMA to local memory gives some advantage in I/O performance.
Solution 2:
The first NUMA Xeons were the 55xx-series, which your G5 can't take, so it's not.