Conceptually, what is the definition of an idle server?

What resource metrics would you look at to make an assumption on whether or not a server was idle?

Would you look at:

  • CPU utilization
  • Disk Usage
  • Memory Usage

If so, what thresholds would these have to be at to decide if something is idle?

Would reboots and patching skew your results if you purely looked at these statistics.


Solution 1:

A machine is idle when it's not performing the task it's supposed to perform for lack of requests. For example, if you had an email server, you could determine if it was fielding any requests from the email application on it. If it isn't, and that's the only thing on there, then it's idle. Things of course get more complicated with collocated services on a single node.

In general, if you can pull the power and nobody cares then it's idle.

Solution 2:

"Idle" is not a black and white concept: even a busy server can have free CPU cycles, IOPs and memory to run some other application.

As a rule of thumb, target CPU load should be at around ~80% and not above ~90%, as maxing out the CPU will greatly increase system latency. CPU load under ~60% generally meas your server is underutilized.

Please also consider that a very I/O-dependant workload will tax the disks but the CPU will be mostly idling (wait time in Linux terms), so maybe you can run a CPU-heavy computation on a I/O-loaded server without too much performance degradation.

Solution 3:

What operating system is it and what is/was the server's purpose?

If I had no other information and I needed to determine is a server was idle I would check to see which services were installed/running and then log some network activity specifically targeting those services. Linux and Windows both have logon auditing. Windows has the performance monitor for monitoring general network activity, and many built in counters of active sessions for various services. For a file server you can check for recently modified/accessed files.

I don't think any of the metrics you've listed are really good indicators of activity or inactivity since there are so many ways to get 'false positives', depending on what you're looking for. E.g. an antivirus program could use CPU cycles or the disk could be full of unused files.

If you want to tell me which OS you're using and what the purpose of the server is/was (if you know) I can edit my answer and provide you with more information.