Is this server overloaded (htop screenshots)

Solution 1:

Your server has only two CPUs and LA (load average) in the range 10-15. That means that the running processes demand more CPU time than the CPUs can handle. You can read much more about LA in this article by Brendan Gregg.

Please note that LA is only a single metric and even though your system isn't getting all the CPU time it wants, it is still possible that it gets enough CPU time to serve end-user requests reasonably well. You need to check your other metrics before making any decisions about this server but if your users are already complaining then the solution is clear - get an instance with more CPUs.

Solution 2:

Define ‘overloaded’.

If you’re just going by load average, then yes, it’s overloaded (by a factor of about 5-7.5). However, load average is only a reasonable metric to use if your workload is massively parallel and primarily CPU-bound. Load average essentially tracks the average number of processes that could run over the past 1/5/15 minutes.

However, based on two of your screenshots, your instantaneous CPU utilization is not constantly 100% of what the system is capable of. This, combined with a high load average, means lots of processes needing to run, but they run quickly and then are done. That’s reasonably normal for a system providing network services, as most network services are not CPU-bound, but instead IO-bound. This means that load average is not a good metric for determining resource utilization on the system.

What you really should be looking at here (and actually, what you really should be looking at first for any network service) is the performance metrics of the service itself. In most cases, the relevant ones are latency measurements for the various request types the service serves (and, more specifically, you usually want to care about the average latency and one of the 95th or 99th percentile or peak latency). htop quite simply cannot track this for you, you need to look at another tool such as Netdata (disclaimer, I work for Netdata) or Prometheus.

Better than even that though: Are users reporting issues? If the answer is no, there are no reported problems, then it’s probably irrelevant if the server is ‘overloaded’ or not, because everything is working well enough.