Cacti Graphing incorrect load averages
I've just set up cacti to monitor CPU and memory usage on a server that I think needs to be upgraded, but to be able to make my case for funding I need hard facts.
I figured getting Cacti to monitor the Memory usage and Load Average would do the trick, but the graph being generated seems to bear no correlation to reality.
According to top my load average right now is hovering at around 5, but cacti is graphing it at 0.1!
How can I get cacti to monitor the real load averages on the server? The server to be monitored is running RHEL5 and using net-SNMP as the SNMP deamon.
Thanks,
Bart.
You might want to look at Munin, which is very easy to setup, especially if you're just running it locally. It will let you quickly start tracking CPU load and other resources without having to mess with SNMP and remotely grabbing resource data. There are packages for RedHat that should be quite simple to install.
cacti has a bad default graph which stacks the 3 values from load average. The total is meaningless, and that is what you are deceived into looking at. Change the default graph to use lines rather than stack and you'll see something better.
Bear in mind that load (e.g. /proc/loadavg) can be averaged on different intervals (tipically 1, 5 and 15 minutes). Add the fact that averaging these figures once again over a time series tends to lower the overall indicator, and you may have a hard time making your case for getting an upgrade.
I suggest you stop thinking on a technical solution and start building a business case around a different indicator, preferrably one that has a correlation to an economic or customer satisfaction indicator -- e.g. maximum response time. Most likely this will get your message through to the people that manages the money.