Why don't servers always run at max?
Solution 1:
Latency will be one reason. The lag between "disk give me this data I need before I can do anything else" and the time the data gets back will leave the CPU idle for that time.
Resources probably do run at 100%, but for very brief periods. An operating system booting will follow the general pattern of "process or decide something, fetch something from disk, do something in memory, do something with a device", repeating many times per second. So when you see a disk at 25% in a 2 second period that probably means it was running at 100% for 0.5 seconds then idle the rest of the time.
As EEAA pointed out multicore systems make this a bit more complex. A single threaded piece of software on a CPU that can execute four threads can only hit 25% running at full speed. Even multithreaded software can rarely hit 100%, because data has to flow (usually) from hard drive, to RAM, to cache, to CPU. Keeping that pipeline full is difficult, and tends to happen mostly with predictable workloads like video encoding. In this case the operating system can observe read patterns and retrieve data before it's required, putting it into appropriate caches, such as the disk cache in RAM.
Solution 2:
You're thinking about this in a very simplistic way, which is causing you to make some incorrect assumptions, which I'll try and clear up.
First, and potentially most simply, on a multicore system, in order to understand CPU usage you have to take into account whether or not the process load is multithreaded, and designed to take advantage of multiple cores. If this is not the case, depending on the mix of processes running, you may not ever see 100% usage. Ever.
Second, you need to consider IO device performance. How does your system know, for instance, how many IOps your devices are capable of? It doesn't. A more meaningful metric for you to watch is your iowait
value during boot (which may be difficult to obtain during the boot process) or the disk queues/latency during boot (which should be easier to obtain from your hypervisor). If you see queues or latency spike, it's likely that your IO devices are a contributing factor to your performance issues.