Understanding Cortex-M assembly LDR with pc offset

I'm looking at the disassembly code for this piece of C code:

#define GPIO_PORTF_DATA_R       (*((volatile unsigned long *)0x400253FC))
int main(void){    
    // Initialization code
    while(1) {
        SW1 = GPIO_PORTF_DATA_R&0x10;  // Read PF4 into SW1
        // Other code
        SW2 = GPIO_PORTF_DATA_R&0x01;
    }
}

The assembly for that SW1= line is (sorry can't copy code):

https://imgur.com/dnPHZrd

Here are my questions:

  • At the first line, PC = 0x00000A56, and PC + 92 = 0x00000AB2, which is not equal to 0x00000AB4, the number shown. Why?

I did a bit of research on SO and found out that PC actually points to the Next Next instruction to be executed.

When pc is used for reading there is an 8-byte offset in ARM mode and 4-byte offset in Thumb mode.

However 0x00000AB4 - 0x00000A56 = 0x5E = 94, neither does it match 92+8 or 92+4. Where did I get wrong?

Reference:

Strange behaviour of ldr [pc, #value]

Why does the ARM PC register point to the instruction after the next one to be executed?

LDR Rd,-Label vs LDR Rd,[PC+Offset]


Solution 1:

From ARM documentation:

Operation 
  address = (PC[31:2] << 2) + (immed_8 * 4) 
  Rd = Memory[address, 4]

The pc is 0xA56+4 because of two instructions ahead and this is thumb so 4 bytes.

(0xA5A>>2)<<2 + (0x17*4)
or
(0x00000A5A&0xFFFFFFFC) + (0x17<<2)
0xA58+92=0xA64

This is an LDR so it is a word-based address ideally. Because the thumb instruction can be on a non-word aligned address, you start off by adding two instructions of course (thumb2 complicates this but add four for thumb). Then zero the lower two bits (LDR) the offset is in words so need to convert that to bytes, times four. This makes the encoding make more sense if you think about each part of it. In arm mode the PC is already word aligned so that step is not required (and in arm mode you have more bits for the immediate so it is byte-based not word-based), making the offset encoding between arm and thumb possibly confusing.

The various documents will show the math in different ways but it is the same math nevertheless. The PC is the only confusing part, especially for thumb. For ARM you add 8, two ahead, for thumb it is basically 4 because the execution cannot tell if there is a thumb2 coming, and it would break a great many things if they had attempted that. So add 4 for the two ahead, for thumb. Since thumb is compressed they do not use a byte offset but instead a word offset giving 4 times the range. Likewise this and/or other instructions can only look forward not back so unsigned offset. This is why you will get alignment errors when assembling things in thumb that in arm would just be unaligned (and you get what you get there depending on architecture and settings). Thumb cannot encode any address for an instruction like this.

For understanding instruction encoding, in particular pc based addressing, it is best to go back to the early ARM ARM (before the armv5 one but if not then just get the armv5 one) as well as the armv6-m and armv7-m and full sized armv7-ar. And look at the pseudo-code for each. The older one generally has the best pseudo-code, but sometimes they leave out the masking of lower bits of the address. No document is perfect, they have bugs just like everything else. Naturally the architecture tied to the core you are using is the official document for the IP the chip vendor used (even down to the specific version of the TRM as these can vary in incompatible ways from one to the next). But if that document is not perfectly clear you can sometimes get an idea from others that, upon inspection, have compatible instructions, architectural features.

Solution 2:

You missed a key part of the rules for Thumb mode, quoted in one of the question you linked (Why does the ARM PC register point to the instruction after the next one to be executed?):

For all other instructions that use labels, the value of the PC is the address of the current instruction plus 4 bytes, with bit[1] of the result cleared to 0 to make it word-aligned.

  • (0xA56 + 4) & -4 = 0xA58 is the location that PC-relative things are relative to during execution of that ldr r0, [PC, #92]

  • ((0xA56 + 4) & -4) + 92 = 0xab4, the location the disassembler calculated.

  • It's equivalent to do 0xA56 & -4 = 0xa54 then +4 + 92, because +4 doesn't modify bit #1; you can think of clearing it before or after adding that +4. But you can't clear the bit after adding the PC-relative offset; that can be unaligned for other instructions like ldrb. (Thumb-mode ldr encodes an offset in words to make better use of the limited number of bits, so the scaled offset and thus the final load address always have bits[1:0] clear.)

(Thanks to Raymond Chen for spotting this; I had also missed it initially!)

Also note that your debugger shows you a PC value when stopped at a breakpoint, but that's the address of the instruction you're stopped at. (Because that's how ARM exceptions work, I assume, saving the actual instruction to return to, not some offset.) During execution of the instruction, PC-relative stuff follows different rules. And the debugger doesn't "cook" this value to show what PC will be during its execution.

The rule is not "relative to the end of this / start of next instruction". Answers and comments stating that rule happen to get the right answer in this case, but would get the wrong answer in other Thumb cases like in LDR Rd,-Label vs LDR Rd,[PC+Offset] where the PC-relative load instruction happens to start at a 4-byte aligned address so bit #1 of PC is already cleared.

Your LDR is at address 0xA56 where bit #1 is set, so the rounding down has an effect. And your ldr instruction used a 2-byte encoding, not a Thumb2 32-bit instruction like you might need for a larger offset. Both of these things means round-down + 4 happens to be the address of the next instruction, rather than 2 instruction later or the middle of this instruction.