How to understand the memory usage and load average in linux server

I am using a linux server which has 128GB of memory and 24 cores. I use top to see how much it is used. Its output is pasted at the end of the post. Here are two questions:

(1) I see that each of the running processes occupies a very small percentage of memory (%MEM no more than 0.2%, and most just 0.0%), but how the total memory is almost used as in the fourth line of output ("Mem: 130766620k total, 130161072k used, 605548k free, 919300k buffers")? The sum of used percentage of memory over all processes seems unlikely to achieve almost 100%, doesn't it?

(2) how to understand the load average on the first line ("load average: 14.04, 14.02, 14.00")?

Thanks and regards!

Edit:

Thanks!

I also really like to hear some rough numbers based on used percentage of memory to determine if a server is heavily loaded, since I once became the one who cramed the server without understanding the current load.

Is swap regarded as almost the same as memory? For example, when memory and swap are almost of same size, if the memory is almost running out but the swap is still largely free, may I just view it as if the used percentage of memory + swap is still not high and run other new processes?

How would you consider together CPU or memory (or memory + swap) usage? Do you become worried if either of them reaches too high or both?

Output of top:

$ top

 
top - 12:45:33 up 19 days, 23:11, 18 users,  load average: 14.04, 14.02, 14.00
Tasks: 484 total,  12 running, 472 sleeping,   0 stopped,   0 zombie
Cpu(s): 36.7%us, 19.7%sy,  0.0%ni, 43.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  130766620k total, 130161072k used,   605548k free,   919300k buffers
Swap: 63111312k total,   500556k used, 62610756k free, 124437752k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6529 sanchez   18  -2 1075m 219m  13m S  100  0.2  13760:23 MATLAB
13210 timothy   18  -2 48336  37m 1216 R  100  0.0   3:56.75 absurdity
13888 timothy   18  -2 48336  37m 1204 R  100  0.0   2:04.89 absurdity
14542 timothy   18  -2 48336  37m 1196 R  100  0.0   1:08.34 absurdity
14544 timothy   18  -2  2888 2076  400 R  100  0.0   1:06.14 gatherData
 6183 sanchez   18  -2 1133m 195m  13m S  100  0.2  13676:04 MATLAB
 6795 sanchez   18  -2 1079m 210m  13m S  100  0.2  13734:26 MATLAB
10178 timothy   18  -2 48336  37m 1204 R  100  0.0  11:33.93 absurdity 
12438 timothy   18  -2 48336  37m 1216 R  100  0.0   5:38.17 absurdity
13661 timothy   18  -2 48336  37m 1216 R  100  0.0   2:44.13 absurdity
14098 timothy   18  -2 48336  37m 1204 R  100  0.0   1:58.31 absurdity
14335 timothy   18  -2 48336  37m 1196 R  100  0.0   1:08.93 absurdity
14765 timothy   18  -2 48336  37m 1196 R   99  0.0   0:32.57 absurdity
13445 timothy   18  -2 48336  37m 1216 R   99  0.0   3:01.37 absurdity
28990 root      20   0     0    0    0 S    2  0.0  65:50.21 pdflush
12141 tim       18  -2 19380 1660 1024 R    1  0.0   0:04.04 top
 1240 root      15  -5     0    0    0 S    0  0.0  16:07.11 kjournald
 9019 root      20   0  296m 4460 2616 S    0  0.0  82:19.51 kdm_greet
    1 root      20   0  4028  728  592 S    0  0.0   0:03.11 init
    2 root      15  -5     0    0    0 S    0  0.0   0:00.00 kthreadd
    3 root      RT  -5     0    0    0 S    0  0.0   0:01.01 migration/0
    4 root      15  -5     0    0    0 S    0  0.0   0:08.13 ksoftirqd/0
    5 root      RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/0
    6 root      RT  -5     0    0    0 S    0  0.0  17:27.31 migration/1
    7 root      15  -5     0    0    0 S    0  0.0   0:01.21 ksoftirqd/1
    8 root      RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/1
    9 root      RT  -5     0    0    0 S    0  0.0  10:02.56 migration/2
   10 root      15  -5     0    0    0 S    0  0.0   0:00.34 ksoftirqd/2
   11 root      RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/2
   12 root      RT  -5     0    0    0 S    0  0.0   4:29.53 migration/3
   13 root      15  -5     0    0    0 S    0  0.0   0:00.34 ksoftirqd/3

(1) I see that each of the running processes occupies a very small percentage of memory (%MEM no more than 0.2%, and most just 0.0%), but how the total memory is almost used as in the fourth line of output ("Mem: 130766620k total, 130161072k used, 605548k free, 919300k buffers")? The sum of used percentage of memory over all processes seems unlikely to achieve almost 100%, doesn't it?

To see how much memory you are currently using, run free -m. It will provide output like:

             total       used       free     shared    buffers     cached
Mem:          2012       1923         88          0         91        515
-/+ buffers/cache:       1316        695
Swap:         3153        256       2896

The top row 'used' (1923) value will almost always nearly match the top row mem value (2012). Since Linux likes to use any spare memory to cache disk blocks (515).

The key used figure to look at is the buffers/cache row used value (1316). This is how much space your applications are currently using. For best performance, this number should be less than your total (2012) memory. To prevent out of memory errors, it needs to be less than the total memory (2012) and swap space (3153).

If you wish to quickly see how much memory is free look at the buffers/cache row free value (695). This is the total memory (2012)- the actual used (1316). (2012 - 1316 = 696, not 695, this will just be a rounding issue)

(2) how to understand the load average on the first line ("load average: 14.04, 14.02, 14.00")?

This article on load average uses a nice traffic analogy and is the best one I've found so far: Understanding Linux CPU Load - when should you be worried?. In your case, as people pointed out:

On multi-processor system, the load is relative to the number of processor cores available. The "100% utilization" mark is 1.00 on a single-core system, 2.00, on a dual-core, 4.00 on a quad-core, etc.

So, with a load average of 14.00 and 24 cores, your server is far from being overloaded.


Unix like systems, including linux, are designed to make the most efficient use of the available RAM possible. In very general terms, there are 3 states each MB of RAM can be in:

  1. Free
  2. Used by a Process
  3. Used for Buffers

The 3rd state is only used as scratch space and is intended to be reassigned whenever necessary, i.e. your total available memory for programs is really Free+UsedforBuffers. As such, you won't really see the buffer allocated space showing up as assigned to any specific process.

Your load average question is a little more interesting, as it can easily be misinterpreted. For the full story see this linuxjournal article. The best summation is a direct quote from the article,

The load-average calculation is best thought of as a moving average of processes in Linux's run queue marked running or uninterruptible

Meaning, that you can think of your load average as (# of running processes)+(# of processes waiting on IO). Keeping in mind that at any given time you can have $CORE number of processes being executed, I would say that your load average of 14 is pretty low.


From the sar man page:

       The load  average is  calculated as  the average number  of runnable or 
       running  tasks (R state), and the  number  of tasks in  uninterruptible
       sleep (D state) over the specified interval.

From the uptime man page:

       System load averages is the average number of processes that are either
       in a runnable or uninterruptable state.  A process in a runnable  state
       is  either  using the CPU or waiting to use the CPU. A process in unin‐
       terruptable state is waiting for some I/O access, eg waiting for  disk.
       The  averages  are  taken over the three time intervals.  Load averages
       are not normalized for the number of CPUs in a system, so a load  aver‐
       age  of 1 means a single CPU system is loaded all the time while on a 4
       CPU system it means it was idle 75% of the time.