Explain load averages on Solaris 10

I understand load averages on Linux, but am a bit mystified by the load averages on a legacy Solaris 10 machine my app runs on. The load averages seem impossibly high. Here's the output.

[netcool1 (root)/]$ uptime
 11:49am  up 580 day(s), 10:51,  3 users,  load average: 35.50, 38.54, 39.03
[netcool1 (root)/]$ uname -a
SunOS netcool1 5.10 Generic_139555-08 sun4u sparc SUNW,Sun-Fire-V245
[netcool1 (root)/]$ psrinfo -v
Status of virtual processor 0 as of: 01/11/2012 11:52:52
  on-line since 06/10/2010 01:58:29.
  The sparcv9 processor operates at 1504 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 1 as of: 01/11/2012 11:52:52
  on-line since 06/10/2010 01:58:27.
  The sparcv9 processor operates at 1504 MHz,
        and has a sparcv9 floating point processor.
[netcool1 (root)/]$ 

I don't see how you can have a load average of 35 on a two-processor system. That seems incredibly high to me. When I view the processes with top, the system is about 60-70% idle. Could someone help explain this?

vmstat 10 6

kthr      memory            page            disk          faults      cpu
r b w   swap  free  re  mf pi po fr de sr rm s0 s2 --   in   sy   cs us sy id
3 0 0 8747008 5562704 865 1866 188 63 63 0 0 -0 9 40 0 762 8588 1495 26  8 66
0 0 0 7715256 5068016 73 23 5 17 17  0  0  0 110 66 0 1135 3888 9855 59 12 30
0 0 0 7717936 5069128 0  5  0  6  6  0  0  0 100 4  0 1071 3273 4191 62  6 32
0 0 0 7717952 5027912 0 11649 0 5 5  0  0  0 115 21 0 1017 26370 3260 32 15 53
102 1 0 7717952 4979088 0 1 0  0  0  0  0  0 112 4  0  900 3464 7683 15  9 76
0 0 0 7717952 4978936 0  1  0  0  0  0  0  0 105 4  0  886 3379 8698 19  9 72

Solution 1:

The "load" is normally an average of the first column of vmstat (column r, the run queue). The first load is averaged over 1 minute, second over 5 minutes, and the last over 15 minutes. As you can see, in your system vmstat at one point reported no less than 102 threads woken up to use the processor (probably some massively multi-threaded application).

But no worries, as certainly this burst of workload has been handled, and run queue went back to zero on the next probe and continuing. The V245 has two processors, each single-core and single-thread, so it can run two threads at the same time (i.e. r=2 means no thread needed to wait for processor time).

Statistically this could translate to an average of 35, but as you can see this value says very very little about actual system usage. Adage says "there are three kinds of lies: lies, damned lies, and statistics", and I think this serves well as a conclusion.

Solution 2:

On older-solaris, the load-average is the average number of runnable and running threads. In other words, it is the number of threads running on the CPUs, plus the number of threads in the run queue, waiting for CPUs, averaged over time.

So... a CPU that completed processing 10 threads for the last second... and had 5 more waiting to be processed would show 15.

In contrast...

Linux load averages are calculated as "overload" of a CPU... i.e. during the last period of time, how many threads were waiting for CPU time over how many were completed. (as a percentage)

So... a CPU that completed processing 10 threads for the last second... and had 5 more waiting to be processed would show 0.5

In Solaris 10... they changed the formula a bit... and I'm not 100% sure what it entails, but it should be pretty-close.

Solution 3:

Quite a late reply but the accepted answer still has incorrect statements, is missing parts of the point, and suggest statistics lies while there is no reason here not to trust the ones reported by the OS.

Here is an in-depth explanation of the statistics observed.

The load average reported by uptime and other commands is a floating average of 1, 5, and 15 minutes of the average number of threads waiting for a CPU (run queue) plus the average number of threads actually running on a CPU.

The idea is to smooth the display of the run queue size and running processes count which is often very irregular.

The run queue size is the first column of vmstat output (r). Any non zero value here means that your system would have run faster should it had more CPUs available.

vmstat first data line shows the average since last boot. An average of 3 threads were waiting on your machine before you launched vmstat. This value is generally meaningless being biased by long inactivity periods like week-ends and other non working hours:

r b w   swap  free  re  mf pi po fr de sr rm s0 s2 --   in   sy   cs us sy id
3 0 0 8747008 5562704 865 1866 188 63 63 0 0 -0 9 40 0 762 8588 1495 26  8 66

All other samples show an empty run queue except the second last one which shows a whopping average number of 102 threads:

102 1 0 7717952 4979088 0 1 0  0  0  0  0  0 112 4  0  900 3464 7683 15  9 76
                                                                          

The CPU is nevertheless 76% idle during this 10 seconds sample which is what puzzles you.

To understand the apparent discrepancy, you need to understand 102 is the average value for this sample. One way to get it is to assume the run queue was holding 1020 threads during one second, then was empty during the remaining 9 seconds. Any other combination leading to that 102 number is also conceivable, like 204 threads during 5 seconds and none during the other 5, and so on.

However, from vmstat last column, we know your system was 76% idle during this period. A plausible value accommodating the average run queue and the idle CPU would be 408 threads competing during 2.4 seconds (100% busy CPUs) and no thread active during 7.6 seconds leading (0% busy CPU).

Now we know there was definitely a CPU contention. Should you have had more than 408 CPUs available instead of 2 and assuming all thread would have been able to run full speed in parallel, you would have reduced these 2.5 seconds to around 6 ms. This would have had a dramatic effect on interactive application but not that much on a batch job as the remaining time wouldn't have benefit from the extra CPUs anyway.

Bottom line:

If your application is interactive, your system is seriously overloaded, if not, it is between slightly overloaded and just "regular".

There is a tradeoff to consider, 6 ms is likely "too good" for a response time and 408 CPU too expensive. Assuming 60 ms is a more reasonable goal, around 40 CPUs might do the job and of course if 2.5 s is fine, your system is behaving correctly.

Generally, a best practice is to assume there is a contention when the overall average run queue size exceeds the number of CPUs, here ~37 vs 2. Figuring out whether it is a problem or not cannot be done without analyzing what applications and threads are affected and how it impacts the platform operation.