How to use trace and dbg in Erlang to debug and trace my program?
I am trying to start using erlang:trace/3
and the dbg
module to trace the behaviour of a live production system without taking the server down.
The documentation is opaque (to put it mildly) and there don't appear to be any useful tutorials online.
What I spent all day trying to do was capture what was happening in a particular function by trying to apply a trace to Module:Function
using dbg:c
and dbg:p
but with no success at all.
Does anyone have a succinct explanation of how to use trace in a live Erlang system?
Solution 1:
The basic steps of tracing for function calls are on a non-live node:
> dbg:start(). % start dbg
> dbg:tracer(). % start a simple tracer process
> dbg:tp(Module, Function, Arity, []). % specify MFA you are interested in
> dbg:p(all, c). % trace calls (c) of that MFA for all processes.
... trace here
> dbg:stop_clear(). % stop tracer and clear effect of tp and p calls.
You can trace for multiple functions at the same time. Add functions by calling tp
for each function. If you want to trace for non-exported functions, you need to call tpl
. To remove functions, call ctp
or ctpl
in a similar manner. Some general tp calls are:
> dbg:tpl(Module, '_', []). % all calls in Module
> dbg:tpl(Module, Function, '_', []). % all calls to Module:Function with any arity.
> dbg:tpl(Module, Function, Arity, []). % all calls to Module:Function/Arity.
> dbg:tpl(M, F, A, [{'_', [], [{return_trace}]}]). % same as before, but also show return value.
The last argument is a match specification. You can play around with that by using dbg:fun2ms
.
You can select the processes to trace on with the call to p(). The items are described under erlang:trace. Some calls are:
> dbg:p(all, c). % trace calls to selected functions by all functions
> dbg:p(new, c). % trace calls by processes spawned from now on
> dbg:p(Pid, c). % trace calls by given process
> dbg:p(Pid, [c, m]). % trace calls and messages of a given process
I guess you will never need to directly call erlang:trace
, as dbg
does pretty much everything for you.
A golden rule for a live node is to generate only an amount of trace output to the shell, which lets you to type in dbg:stop_clear().
. :)
I often use a tracer that will auto-stop itself after a number of events. For example:
dbg:tracer(process, {fun (_,100) -> dbg:stop_clear();
(Msg, N) -> io:format("~p~n", [Msg]), N+1 end, 0
}).
If you are looking for debugging on remote nodes (or multiple nodes), search for pan
, eper
, inviso
or onviso
.
Solution 2:
On live systems we rarely trace to shell. If the system is well configured then it is already collecting your Erlang logs that were printed to the shell. I need not emphasize why this is crucial in any live node...
Let me elaborate on tracing to files:
It is possible to trace to file, which will produce a binary output that can be converted and parsed later. (for further analysis or automated controlling system, etc.)
An example could be:
-
Trace to multiple files wrapped (12x50 Mbytes).Please always check the available disk space before using such a big trace!
dbg:tracer(port,dbg:trace_port(file,{"/log/trace",wrap,atom_to_list(node()),50000000,12})).
dbg:p(all,[call,timestamp,return_to]).
- Always test on a test node before entering anything to a live node's shell!
- It is most advised to have a test node or replica node to try the scripts first.
That said let's have a look at a basic tracing command sequence:
<1> dbg:stop_clear().
- Always start by flushing trace ports and ensuring that no previous tracing interferes with the current trace.
<2> dbg:tracer().
- Start the tracer process.
<3> dbg:p(all,[call, timestamp]).
- In this case we are tracing for all processes and for function calls.
<4> dbg:tp( ... ).
- As seen in Zed's answer.
<5> dbg:tpl( ... ).
- As seen in Zed's answer.
<42> dbg:stop_clear().
- Again it is to ensure that all traces were written to the output and to evade any later inconvenience.
You can:
add triggers by defining some fun()-s in the shell to stop the trace at a given time or event. Recursive fun()-s are the best to achieve this, but be very careful when applying those.
apply a vast variety of pattern matching to ensure that you only trace for the specific process with the specific function call with the specific type of arguments...
I had an issue a while back, when we had to check the content of an ETS table and on appearance of a certain entry we had to stop the trace within 2-3 minutes.
I also suggest the book Erlang Programming written by Francesco Cesarini. (Erlang Programming @ Amazon)
Solution 3:
The 'dbg' module is quite low-level stuff. There are two hacks that I use very frequently for the tasks that I commonly need.
Use the Erlang CLI/shell expansion code at http://www.snookles.com/erlang/user_default.erl. It was originally written (as far as I know) by Serge Aleynikov and has been a useful "so that's how I add custom functions to the shell" example. Compile the module and edit your ~/.erlang file to point to its path (see comment at the top of the file).
Use the "redbug" utility that's bundled with in the EPER collection of utilities. It's very easy to use 'dbg' to create millions of trace events in a few seconds. Doing so in a production environment can be disastrous. For development or production use, redbug makes it nearly impossible to kill a running system with a trace-induced overload.
Solution 4:
If you would prefer a graphical tracer then try erlyberly. It allows you to select the functions you would like to trace (on all processes at the moment) and deals with the dbg API.
However it does not protect against overload so is not suitable for production systems.