What's difference between monitoring, tracing and profiling?
I have seen these three words showing up a lot, but don't understand the exact differences between them. For example, collecting CPU utilisation is often called profiling and can also fall into performance monitoring. What is the (subtle) difference between them?
Solution 1:
This is the way I use these words. Others may have additional or different usages. Depending on the job at hand, I will use the terms differently. Development teams and operations teams have different needs an usage.
Monitoring is monitoring. Usually it is ongoing, and preferably automated. Open source tools like Munin
, Nagios
, and MRTG
fall into this category. There a lot of commercial tools as well. I would also include sar
run continuously in this category, but its results are not normally monitored. Monitoring tools can be used to trigger alerts when a monitored resource falls above or below a trigger level. Many monitoring tools work well in heterogeneous environments.
Profiling is usually done on a particular program to see which code is using the most resources. Often this is CPU time, but can also include memory, I/O, and execution (wall) time. It is usually used to identify candidate code for optimization. Profiling tools tend to be language and/or platform dependent.
A different kind of profiling is done using logs and/or monitoring data. This is usage profiling and can be done for a variety of reasons. I haven't found many tools to do this.
I use tracing in a couple of different ways. Most frequently, I trace network routes. Depending on network and firewall settings a variety of tools may be used with more or less success. Most of these have traceroute in their name or description.
Program tracing is tracing the execution of a program. This is generally done in a test situation. This can be done in a number of ways (in my order of usage and experience):
- Call tracing using tools like
strace
to see what code is called. This can be useful in determining why a program is failing or not responding as expected. - Trace level logging, which depends on appropriate logging statements being included in the code. Most logging suites support this level of detail. Trace level logging tends to have poor code coverage. I generally add it as needed and leave it in the code for future use.
- Code coverage tracing records which parts of the code have executed in a test suite. This can be useful in determining missing test cases. 100% coverage of code is difficult to get. 100% coverage of normal flows should be achievable.
- Desk checking: tracing through the code by reading it. Not very useful on larger programs, but a good way to identify edge cases for unit tests, ans/or to identify possible issues when the likely source has been bee narrowed down. Som=e IDEs and editors make it relatively easy to follow a call to the implementing code.
- Live debugging; tracing code execution while it is running using a debugger. It is possible to trace execution instruction by instruction, but if the problem is a timing issue, it may be obscured. Debuggers which can link code to the current instruction help a lot but may require a debug version of the program to be built.