Vendor whitepaper says: 5Mpps no prob. I'm already hitting a wall at 120kpps. Where's the bottleneck?

Solution 1:

RSS too is enabled in the NIC settings, with 8 queues.

Which unfortunately did not mean that RSS was being employed, as

netsh int tcp show global

showed:

TCP Global Parameters
----------------------------------------------
Receive-Side Scaling State : disabled

After running (btw without rebooting)

netsh int tcp set global rss=enabled

RSS started working and the load that formerly heaped on that one poor core now gets evenly distributed over many cores on one of the 2 NUMA nodes.

I haven't verified if that would allow me to handle the Mpps loads advertised, but the ceiling was lifted sufficiently to benchmark what I needed to.