Disadvantages of running an AWS worker server at 100% CPU

On a machine (AWS m5.large) that only runs nice'd background processing jobs (i.e. no web/DB/etc servers present), are there disadvantages to consistently running the CPU at 100%?

I understand that running the system such that it consumes 100% of the available memory is not a good idea. Without swap, the system will simply kill processes when it runs out of memory. Even with swap, the system will start swapping out pages which slows the entire system down dramatically.

However, my understanding is that a system with nice'd processes running at 100% CPU usage will function without dramatic slowdowns. Is this correct?

Or, would it be better to try to configure the background processes such that the system stays within the range of 60% - 90% CPU usage?

So long as the system is doing what you need and is responsive to logins and changes, running 100% CPU is no problem, that's what it's for. Nice only changes the relative priority of processes.

In AWS avoid the T series instances if you're using 100% of a CPU, as they give fractional CPU. Over the CPU allocation it's cheaper to get a M (general purpose) / C (Compute Intensive) / other series VM for committed CPU than using "T2 / T3 unlimited".

To address a comment, AWS (and I assume other leading cloud providers) do not have a "fair use" policy for CPU, that tends to be from low end providers or shared hosts. If you pay for a core, you can use the core 100%. If your instances are underutilized the AWS Trusted Advisor service recommends smaller instances to help you save money.

On-premise you can obviously do whatever you like. This answer is the common case and applies to cloud, AWS specifically.

Whether you nice or not, running at 100% CPU means you are not processing your jobs as quickly as you could be, if you had more CPU available. The entire system does indeed slow down. The only thing nice does for you is to let you indicate which processes have higher or lower priority and should have more or less of your already limited CPU.

If your jobs are slower than you expect, the only thing that will make any significant difference is to give them more CPU. If you take it from other jobs, then those jobs will slow down. If you upgrade your CPU, then everything will run faster. Of course, since it's EC2, you could also just add more instances.

There is no problem in running a CPU at 100%.

Even in the unlikely case that your specific hardware had a cooling problem leading to overheating on , as this is an AWS server, that'd be Amazon's issue, not yours (rest assured, they took that into account in their pricing model)

If it didn't do that job, it would be sittling idle, so if you need to have $job done, better have it doing it. You don't want to artificially restrict it.

The main disadvantage would be using the CPU continuously at 100% will need more power. But you wanted that task done, right?¹

(¹ Do note that in some cases like bitcoin mining, the cost of electricity is higher than the value of the mined bitcoins)

Second, if the system CPU is fully used at 100% doing some not-too-important task (like crunching SETI packets), it could happen that something more important arrived (such as an interactive request by the owner), but the computer doesn't pay attention to that very promptly because it was busy processing those packets. This is solved by nicing that less-important task. Then the system knowns how to prioritise them and you avoid this problem.

In some places you may see that it is bad to have a server working at 100%. A server with CPU at 100% shows a bottleneck in the process. You could produce more with more cpus or quicker ones, but as long as you are happy enough with the throughput, it's ok. You can think on it as a shop where all clerks where always busy. This is probably bad, as more customers can't shop there since they can't be served.

However, if we have a warehouse with items to sort, with no special deadline, and enough work for the following 5 years, you would want have everyone working full time on it, not keeping someone idle.

If the warehouse is near the shop, you can do combine things: you have the clerks serving customers, and when there are no customers left, they advance sorting the warehouse, until the next client arrives.

Traditionally, you have certain dedicated hardware and it's up to you to use it more or less. In a model like AWS, you have more options, though. (Note: I am assuming your task is made up by many small, easily parallelizable chunks)

Use a single instance of size X for as long as needed
Use a faster instance of size X+n
Use a slower but cheaper instance, taking more time
Use multiple instances

In some cases you could use several smaller instances for the cost of a big one, getting more results (while for other task sets it wouldn't).

Plus, the costs aren't fixed. You can probably benefit by launching extra instances off-hours, when they are cheaper but shrinking them when it'd be more expensive. Suppose you were able to borrow the clerk of nearby stores (at certain variable rate). The open-24-hours shop could happily let you have the employee doing the night shift sort some of your warehouse items quite cheaply, since only a handful of customers will pass by. However, if you wanted some extra pair of hands on Black Friday, that would be much more expensive. (in fact, better not to have anyone left sorting the warehouse that day)

AWS lets you do a lot of dynamic load, and when you don't have to the responses in X time, you can optimize your costs noticeably. However, they have "too many options", and they are complex to understand. You also need to understand pretty well your workload, in order to take the right decisions.

Disadvantages of running an AWS worker server at 100% CPU

Related

Recent Posts