Linux server performance profiling - how to see what caused high load

If a server is experiencing high load, I use top and similar tools to troubleshoot why. However, this is only effective if I can analyze while the server is experiencing the problem.

What are some good tools for finding root cause of high server load in previous times? For example I was planning to put in a cron job to save 'top' output, apache server stats, mysql process list, etc every 5 minutes. But that doesn't seem very elegant, wondering if someone has come up with some utilities to accomplish this already.


For ongoing monitoring you could consider installing munin. It will gather information every 5 minutes and generates graphs that will allow you to see where the bottlenecks are. I also use sar which can be run in background mode gathering data to disk. This gives quite detailed infomation on what the bottleneck is. To what processes where running in the past you will need the process accounting package.


I like collectd but I've recently started toying with pcp (performance co-pilot). It has some nice features for historical diagnosis. [1]: http://oss.sgi.com/projects/pcp/