How does reverse debugging work?
Solution 1:
I'm a gdb maintainer and one of the authors of the new reverse debugging. I'd be happy to talk about how it works. As several people have speculated, you need to save enough machine state that you can restore later. There are a number of schemes, one of which is to simply save the registers or memory locations that are modified by each machine instruction. Then, to "undo" that instruction, you just revert the data in those registers or memory locations.
Yes, it is expensive, but modern cpus are so fast that when you are interactive anyway (doing stepping or breakpoints), you don't really notice it that much.
Solution 2:
Note that you must not forget the use of simulators, virtual machines, and hardware recorders to implement reverse execution.
Another solution to implement it is to trace execution on physical hardware, such as is done by GreenHills and Lauterbach in their hardware-based debuggers. Based on this fixed trace of the action of each instruction, you can then move to any point in the trace by removing the effects of each instruction in turn. Note that this assumes that you can trace all things that affect the state visible in the debugger.
Another way is to use a checkpoint + re-execution method, which is used by VmWare Workstation 6.5 and Virtutech Simics 3.0 (and later), and which seems to be coming with Visual Studio 2010. Here, you use a virtual machine or a simulator to get a level of indirection on the execution of a system. You regularly dump the entire state to disk or memory, and then rely on the simulator being able to deterministically re-execute the exact same program path.
Simplified, it works like this: say that you are at time T in the execution of a system. To go to time T-1, you pick up some checkpoint from point t < T, and then execute (T-t-1) cycles to end up one cycle before where you were. This can be made to work very well, and apply even for workloads that do disk IO, consist of kernel-level code, and performs device driver work. The key is to have a simulator that contains the entire target system, with all its processors, devices, memories, and IOs. See the gdb mailinglist and the discussion following that on the gdb mailing list for more details. I use this approach myself quite regularly to debug tricky code, especially in device drivers and early OS boots.
Another source of information is a Virtutech white paper on checkpointing (which I wrote, in full disclosure).
Solution 3:
During an EclipseCon session we also asked how they do this with the Chronon Debugger for Java. That one does not allow you to actually step back, but can play back a recorded program execution in such a way that it feels like reverse debugging. (The main difference is that you cannot change the running program in the Chronon debugger, while you can do that in most other Java debuggers.)
If I understood it correctly, it manipulates the byte code of the running program, such that every change of an internal state of the program is recorded. External states don't need to be recorded additionally. If they influence your program in some way, then you must have an internal variable matching that external state (and therefore that internal variable is enough).
During playback time they can then basically recreate every state of the running program from the recorded state changes.
Interestingly the state changes are much smaller than one would expect on first look. So if you have a conditional "if" statement, you would think that you need at least one bit to record whether the program took the then- or the else-statement. In many cases you can avoid even that, like in the case that those different branches contain a return value. Then it is enough to record only the return value (which would be needed anyway) and to recalculate the decision about the executed branch from the return value itself.
Solution 4:
Although this question is old, most of the answers are too, and as reverse-debugging remains an interesting topic, I'm posting a 2015 answer. Chapters 1 and 2 of my MSc thesis, Combining reverse debugging and live programming towards visual thinking in computer programming, covers some of the historical approaches to reverse debugging (especially focused on the snapshot-(or checkpoint)-and-replay approach), and explains the difference between it and omniscient debugging:
The computer, having forward-executed the program up to some point, should really be able to provide us with information about it. Such an improvement is possible, and is found in what are called omniscient debuggers. They are usually classified as reverse debuggers, although they might more accurately be described as "history logging" debuggers, as they merely record information during execution to view or query later, rather than allow the programmer to actually step backwards in time in an executing program. "Omniscient" comes from the fact that the entire state history of the program, having been recorded, is available to the debugger after execution. There is then no need to rerun the program, and no need for manual code instrumentation.
Software-based omniscient debugging started with the 1969 EXDAMS system where it was called "debug-time history-playback". The GNU debugger, GDB, has supported omniscient debugging since 2009, with its 'process record and replay' feature. TotalView, UndoDB and Chronon appear to be the best omniscient debuggers currently available, but are commercial systems. TOD, for Java, appears to be the best open-source alternative, which makes use of partial deterministic replay, as well as partial trace capturing and a distributed database to enable the recording of the large volumes of information involved.
Debuggers that do not merely allow navigation of a recording, but are actually able to step backwards in execution time, also exist. They can more accurately be described as back-in-time, time-travel, bidirectional or reverse debuggers.
The first such system was the 1981 COPE prototype ...