Tune Linux & Nignx to handle 10k Connections @10Gbps Server
Solution 1:
Already answered by Brandon. Turn on irqbalance. Run numad and tuned. Stop trying to tune unless you have a specific workload that requires it. Where are your wrk test results from testing 2000-10000 requests before you deployed? This problem should never have been seen in production. It clearly would have been identified by testing. Real world use will often uncover uncommon bugs, but many/most configuration and application bugs can be identified and corrected during testing. There are many docs available regarding irq affinity. I doubt your use case can do better than using the tuning tools built in. More than likely, your hand tuning will perform worse.
Solution 2:
The output from top
says your kernel is being inundated with soft interrupts from all of the incoming connections. The connections are coming in so fast that the hardware interrupts triggered by the network card are queueing soft interrupts faster than the kernel can deal with them. This is why your CPU, RAM, and IO usage is so low; the system keeps getting interrupted by incoming connections. What you need here is a load-balancer.