how arm-thumb instruction set's blx instruction support 4MB range

In the original Thumb instruction set, the BL instruction comprised two 16 bit instructions encoded such:

1111 HOOO OOOO OOOO   BL <label>
     |            |
     |            \.. long branch and link offset high/low
     \............... low/high offset
                        0 -- offset high
                        1 -- offset low

The first of the two instructions must have the H bit set to 0. The 11 bit offset is shifted to the left by 12, added to PC and placed into the LR register.

LR = PC + (offset << 12)

The second of the two instructions must have the H bit set to 1. The 11 bit offset is shifted to the left by 1, added to the contents of the LR register, and used as a branch target. The LR register is set to the return address.

temp = next instruction address
PC = LR + (offset << 1)
LR = temp | 1

With ARMv5T, a Thumb encoding of the BLX instruction was added, allowing Thumb code to call into ARM code. This was done by defining a new thumb bit in the second half* of the BL instruction.

111T 1OOO OOOO OOOO   BL/BLX <label> (second half)
   |              |
   |              \.. long branch link exchange offset low
   \................. thumb bit
                        0 -- BLX is encoded
                        1 -- BL  is encoded

The operation of BLX is similar to the second half of the BL instruction, but the offset must be even. The function is called in ARM state instead of Thumb state.

temp = next instruction address
PC = (LR + (offset << 1)) & 0xfffffffc
LR = temp | 1
CSPR T bit = 0

Note that with a total of 22 immediate bits giving an offset in halfwords, the observed branch offset of ±4 MiB is achieved.

Putting the two halves together, we can also see BL and BLX as 32 bit instructions with an encoding like this:

1111 0OOO OOOO OOOO  111T 1OOO OOOO OOOO   BL/BLX <label>
                  |     |              |
                  |     |              \.. 22 bit offset (low half)
                  |     \................. thumb bit
                  \....................... 22 bit offset (high half)

In Thumb2, this scheme was extended. BL and BLX became proper 32 bit instructions and their halves must be given consecutively. Some bits of the second instruction word were defined to extend the branch offset to ±16 MiB.

1111 0SOO OOOO OOOO  11AT BOOO OOOO OOOO   BL/BLX <label>
      |           |    || |            |
      |           |    || |            \.. 21 bit offset (low half)
      |           |    || \............... additional bit J2
      |           |    |\................. thumb bit
      |           |    \.................. additional bit J1
      |           \....................... 21 bit offset (high half)
      \................................... sign bit

If the thumb bit is set, the BL instruction is encoded. If it is clear, the BLX instruction is encoded. In the latter case, the 21 bit offset must be even. The branch offset is then computed as follows:

I1 = !(J1 ^ S)
I2 = !(J2 ^ S)
imm32 = (S ? 0xffff << 24 : 0) | (I1 << 23) | (I2 << 22) | (imm21 << 1)
temp = next instruction address
PC = LR + offset
LR = temp | 1
if thumb bit clear
    CSPR T bit = 0 

While the scheme to encode the additional offset bits seems convoluted at first, it is just the simplest way to encode two additional bits into the branch offset while being compatible with the existing encoding of the BL and BLX instructions.


See the ARM Architecture Reference Manual, ARMv7-A and ARMv7-R edition, the ARM7TDMI Data Sheet, and the ARM Architecture Reference Manual for ARMv5 for further reading.

* The related encoding 1110 0OOO OOOO OOOO encodes the 16 bit unconditional branch instruction B <label>.

† Before Thumb2, the two parts of a BL or BLX instruction were independent instructions and could be given interspersed with other instructions or even individually, though it was strongly recommended to issue them in consecutive order. An interrupt could also occur between the two halves of a BL or BLX instruction, making the temporary contents of the LR register observable. On Thumb2 targets including ARMv6-M, this is no longer possible and BL and BLX behave as a 32 bit instruction.