Making your .NET language step correctly in the debugger

Firstly, I apologize for the length of this question.

I am the author of IronScheme. Recently I have been working hard on emitting decent debug info, so that I can use the 'native' .NET debugger.

While this has been partly successful, I am running into some teething problems.

The first problem is related to stepping.

Due to Scheme being an expression language, everything tends to be wrapped in parenthesis, unlike the major .NET languages which seems to be statement (or line) based.

The original code (Scheme) looks like:

(define (baz x)
  (cond
    [(null? x) 
      x]
    [(pair? x) 
      (car x)]
    [else
      (assertion-violation #f "nooo" x)]))

I have on purpose laid out each expression on a newline.

The emitted code transforms to C# (via ILSpy) looks like:

public static object ::baz(object x)
{
  if (x == null)
  {
    return x;
  }
  if (x is Cons)
  {
    return Builtins.Car(x);
  }
  return #.ironscheme.exceptions::assertion-violation+(
     RuntimeHelpers.False, "nooo", Builtins.List(x));
}

As you can see, pretty simple.

Note: If the code was transformed into a conditional expression (?:) in C#, the whole thing would just be one debug step, keep that in mind.

Here is IL output with source and line numbers:

  .method public static object  '::baz'(object x) cil managed
  {
    // Code size       56 (0x38)
    .maxstack  6
    .line 15,15 : 1,2 ''
//000014: 
//000015: (define (baz x)
    IL_0000:  nop
    .line 17,17 : 6,15 ''
//000016:   (cond
//000017:     [(null? x) 
    IL_0001:  ldarg.0
    IL_0002:  brtrue     IL_0009

    .line 18,18 : 7,8 ''
//000018:       x]
    IL_0007:  ldarg.0
    IL_0008:  ret

    .line 19,19 : 6,15 ''
//000019:     [(pair? x) 
    .line 19,19 : 6,15 ''
    IL_0009:  ldarg.0
    IL_000a:  isinst [IronScheme]IronScheme.Runtime.Cons
    IL_000f:  ldnull
    IL_0010:  cgt.un
    IL_0012:  brfalse    IL_0020

    IL_0017:  ldarg.0
    .line 20,20 : 7,14 ''
//000020:       (car x)]
    IL_0018:  tail.
    IL_001a:  call object [IronScheme]IronScheme.Runtime.Builtins::Car(object)
    IL_001f:  ret

    IL_0020:  ldsfld object 
         [Microsoft.Scripting]Microsoft.Scripting.RuntimeHelpers::False
    IL_0025:  ldstr      "nooo"
    IL_002a:  ldarg.0
    IL_002b:  call object [IronScheme]IronScheme.Runtime.Builtins::List(object)
    .line 22,22 : 7,40 ''
//000021:     [else
//000022:       (assertion-violation #f "nooo" x)]))
    IL_0030:  tail.
    IL_0032:  call object [ironscheme.boot]#::
       'ironscheme.exceptions::assertion-violation+'(object,object,object)
    IL_0037:  ret
  } // end of method 'eval-core(033)'::'::baz'

Note: To prevent the debugger from simply highlighting the entire method, I make the method entry point just 1 column wide.

As you can see, each expression maps correctly to a line.

Now the problem with stepping (tested on VS2010, but same/similar issue on VS2008):

These are with IgnoreSymbolStoreSequencePoints not applied.

  1. Call baz with null arg, it works correctly. (null? x) followed by x.
  2. Call baz with Cons arg, it works correctly. (null? x) then (pair? x) then (car x).
  3. Call baz with other arg, it fails. (null? x) then (pair? x) then (car x) then (assertion-violation ...).

When applying IgnoreSymbolStoreSequencePoints (as recommended):

  1. Call baz with null arg, it works correctly. (null? x) followed by x.
  2. Call baz with Cons arg, it fails. (null? x) then (pair? x).
  3. Call baz with other arg, it fails. (null? x) then (pair? x) then (car x) then (assertion-violation ...).

I also find in this mode that some lines (not shown here) are incorrectly highlighted, they are off by 1.

Here are some ideas what could be the causes:

  • Tailcalls confuses the debugger
  • Overlapping locations (not shown here) confuses the debugger (it does so very well when setting a breakpoint)
  • ????

The second, but also serious, issue is the debugger failing to break/hit breakpoints in some cases.

The only place where I can get the debugger to break correctly (and consistantly), is at the method entry point.

The situation gets a bit better when IgnoreSymbolStoreSequencePoints is not applied.

Conclusion

It might be that the VS debugger is just plain buggy :(

References:

  1. Making a CLR/.NET Language Debuggable

Update 1:

Mdbg does not work for 64-bit assemblies. So that is out. I have no more 32-bit machines to test it on. Update: I am sure this is no big problem, does anyone have a fix? Edit: Yes, silly me, just start mdbg under the x64 command prompt :)

Update 2:

I have created a C# app, and tried to dissect the line info.

My findings:

  • After any brXXX instruction you need to have a sequence point (if not valid aka '#line hidden', emit a nop).
  • Before any brXXX instruction, emit a '#line hidden' and a nop.

Applying this, does not however fix the issues (alone?).

But adding the following, gives the desired result :)

  • After ret, emit a '#line hidden' and a nop.

This is using the mode where IgnoreSymbolStoreSequencePoints is not applied. When applied, some steps are still skipped :(

Here is the IL output when above has been applied:

  .method public static object  '::baz'(object x) cil managed
  {
    // Code size       63 (0x3f)
    .maxstack  6
    .line 15,15 : 1,2 ''
    IL_0000:  nop
    .line 17,17 : 6,15 ''
    IL_0001:  ldarg.0
    .line 16707566,16707566 : 0,0 ''
    IL_0002:  nop
    IL_0003:  brtrue     IL_000c

    .line 16707566,16707566 : 0,0 ''
    IL_0008:  nop
    .line 18,18 : 7,8 ''
    IL_0009:  ldarg.0
    IL_000a:  ret

    .line 16707566,16707566 : 0,0 ''
    IL_000b:  nop
    .line 19,19 : 6,15 ''
    .line 19,19 : 6,15 ''
    IL_000c:  ldarg.0
    IL_000d:  isinst     [IronScheme]IronScheme.Runtime.Cons
    IL_0012:  ldnull
    IL_0013:  cgt.un
    .line 16707566,16707566 : 0,0 ''
    IL_0015:  nop
    IL_0016:  brfalse    IL_0026

    .line 16707566,16707566 : 0,0 ''
    IL_001b:  nop
    IL_001c:  ldarg.0
    .line 20,20 : 7,14 ''
    IL_001d:  tail.
    IL_001f:  call object [IronScheme]IronScheme.Runtime.Builtins::Car(object)
    IL_0024:  ret

    .line 16707566,16707566 : 0,0 ''
    IL_0025:  nop
    IL_0026:  ldsfld object 
      [Microsoft.Scripting]Microsoft.Scripting.RuntimeHelpers::False
    IL_002b:  ldstr      "nooo"
    IL_0030:  ldarg.0
    IL_0031:  call object [IronScheme]IronScheme.Runtime.Builtins::List(object)
    .line 22,22 : 7,40 ''
    IL_0036:  tail.
    IL_0038:  call object [ironscheme.boot]#::
      'ironscheme.exceptions::assertion-violation+'(object,object,object)
    IL_003d:  ret

    .line 16707566,16707566 : 0,0 ''
    IL_003e:  nop
  } // end of method 'eval-core(033)'::'::baz'

Update 3:

Problem with above 'semi-fix'. Peverify reports errors on all methods due to the nop after ret. I dont understand the problem really. How can a nop break verification after a ret. It is like dead code (except that it is NOT even code) ... Oh well, experimentation continues.

Update 4:

Back at home now, removed the 'unverifiable' code, running on VS2008 and things are a lot worse. Perhaps running unverifiable code for the sake of proper debugging might be the answer. In 'release' mode, all output would still be verifiable.

Update 5:

I have now decided my above idea is the only viable option for now. Although the generated code is unverifiable, I have yet to find any VerificationException's. I dont know what the impact will be on the end user with this scenario.

As a bonus, my second issue has also be solved. :)

Here is a little screencast of what I ended up with. It hits breakpoints, does proper stepping (in/out/over), etc. All in all, the desired effect.

I, however, am still not accepting this as the way to do it. It feel overly-hacky to me. Having a confirmation on the real issue would be nice.

Update 6:

Just had the change to test the code on VS2010, there seems to be some problems:

  1. The first call now does not step correctly. (assertion-violation ...) is hit. Other cases works fine. Some old code emitted unnecessary positions. Removed the code, works as expected. :)
  2. More seriously, breakpoints fail on the second invocation of the program (using in-memory compilation, dumping assembly to file seems to make breakpoints happy again).

Both these cases work correctly under VS2008. The main difference is that under VS2010, the entire application is compiled for .NET 4 and under VS2008, compiles to .NET 2. Both running 64-bit.

Update 7:

Like mentioned, I got mdbg running under 64-bit. Unfortunately, it also have the breakpoint issue where it fails to break if I rerun the program (this implies it gets recompiled, so not using the same assembly, but still using the same source).

Update 8:

I have filed a bug at the MS Connect site regarding the breakpoint issue.

Update: Fixed

Update 9:

After some long thinking, the only way to make the debugger happy seems to be doing SSA, so every step can be isolated and sequential. I am yet to prove this notion though. But it seems logical. Obviously, cleaning up temps from SSA will break debugging, but that is easy to toggle, and leaving them does not have much overhead.


I am an engineer on the Visual Studio Debugger team.

Correct me if I am wrong, but it sounds like the only issue left is that when switching from PDBs to the .NET 4 dynamic compile symbol format some breakpoints are being missed.

We would probably need a repro to exactly diagnose the issue, however here are some notes that might help.

  1. VS (2008+) can-to run as a non-admin
  2. Do any symbols load at all the second time around? You might test by breaking in (through exception or call System.Diagnostic.Debugger.Break())
  3. Assuming that symbols load, is there a repro that you could send us?
  4. The likely difference is that the symbol format for dynamic-compiled code is 100% different between .NET 2 (PDB stream) and .NET 4 (IL DB I think they called it?)
  5. The 'nop's sound about right. See rules for generating implicit sequence points below.
  6. You don't actually need to emit things on different lines. By default, VS will step 'symbol-statements' where, as the compiler writer you get to define what 'symbol-statement' means. So if you want each expression to be a separate thing in the symbol file, that will work just fine.

The JIT creates an implicit sequence point based on the following rules: 1. IL nop instructions 2. IL stack empty points 3. The IL instruction immediately following a call instruction

If it turns out we do need a repro to solve your issue, you can file a connect bug and upload files securely through that medium.

Update:

We are encouraging other users experiencing this issue to try the Developer Preview of Dev11 from http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=27543 and comment with any feedback. (Must target 4.5)

Update 2:

Leppie has verified the fix to work for him on the Beta version of Dev11 available at http://www.microsoft.com/visualstudio/11/en-us/downloads as noted in the connect bug https://connect.microsoft.com/VisualStudio/feedback/details/684089/.

Thanks,

Luke


I am an engineer on the SharpDevelop Debugger team :-)

Did you solve the problem?

Did you try to debug it in SharpDevelop? If there is a bug in .NET, I wonder if we need to implement some workaround. I am not aware of this issue.

Did you try to debug it in ILSpy? Especially without debug symbols. It would debug C# code, but it would tell us if the IL instructions are nicely debugable. (Mind that ILSpy debugger is beta though)

Quick notes on the original IL code:

  • .line 19,19 : 6,15 '' occurs twice?
  • .line 20,20 : 7,14 '' does not start on implicit sequence point (stack is not empty). I am worried
  • .line 20,20 : 7,14 '' includes the code for "car x" (good) as well as "#f nooo x" (bad?)
  • regarding the nop after ret. What about stloc, ldloc, ret? I think C# uses this trick to make ret a distinct sequence point.

David