What are SP (stack) and LR in ARM?

LR is link register used to hold the return address for a function call.

SP is stack pointer. The stack is generally used to hold "automatic" variables and context/parameters across function calls. Conceptually you can think of the "stack" as a place where you "pile" your data. You keep "stacking" one piece of data over the other and the stack pointer tells you how "high" your "stack" of data is. You can remove data from the "top" of the "stack" and make it shorter.

From the ARM architecture reference:

SP, the Stack Pointer

Register R13 is used as a pointer to the active stack.

In Thumb code, most instructions cannot access SP. The only instructions that can access SP are those designed to use SP as a stack pointer. The use of SP for any purpose other than as a stack pointer is deprecated. Note Using SP for any purpose other than as a stack pointer is likely to break the requirements of operating systems, debuggers, and other software systems, causing them to malfunction.

LR, the Link Register

Register R14 is used to store the return address from a subroutine. At other times, LR can be used for other purposes.

When a BL or BLX instruction performs a subroutine call, LR is set to the subroutine return address. To perform a subroutine return, copy LR back to the program counter. This is typically done in one of two ways, after entering the subroutine with a BL or BLX instruction:

• Return with a BX LR instruction.

• On subroutine entry, store LR to the stack with an instruction of the form: PUSH {,LR} and use a matching instruction to return: POP {,PC} ...

This link gives an example of a trivial subroutine.

Here is an example of how registers are saved on the stack prior to a call and then popped back to restore their content.


SP is the stack register a shortcut for typing r13. LR is the link register a shortcut for r14. And PC is the program counter a shortcut for typing r15.

When you perform a call, called a branch link instruction, bl, the return address is placed in r14, the link register. the program counter pc is changed to the address you are branching to.

There are a few stack pointers in the traditional ARM cores (the cortex-m series being an exception) when you hit an interrupt for example you are using a different stack than when running in the foreground, you dont have to change your code just use sp or r13 as normal the hardware has done the switch for you and uses the correct one when it decodes the instructions.

The traditional ARM instruction set (not thumb) gives you the freedom to use the stack in a grows up from lower addresses to higher addresses or grows down from high address to low addresses. the compilers and most folks set the stack pointer high and have it grow down from high addresses to lower addresses. For example maybe you have ram from 0x20000000 to 0x20008000 you set your linker script to build your program to run/use 0x20000000 and set your stack pointer to 0x20008000 in your startup code, at least the system/user stack pointer, you have to divide up the memory for other stacks if you need/use them.

Stack is just memory. Processors normally have special memory read/write instructions that are PC based and some that are stack based. The stack ones at a minimum are usually named push and pop but dont have to be (as with the traditional arm instructions).

If you go to http://github.com/lsasim I created a teaching processor and have an assembly language tutorial. Somewhere in there I go through a discussion about stacks. It is NOT an arm processor but the story is the same it should translate directly to what you are trying to understand on the arm or most other processors.

Say for example you have 20 variables you need in your program but only 16 registers minus at least three of them (sp, lr, pc) that are special purpose. You are going to have to keep some of your variables in ram. Lets say that r5 holds a variable that you use often enough that you dont want to keep it in ram, but there is one section of code where you really need another register to do something and r5 is not being used, you can save r5 on the stack with minimal effort while you reuse r5 for something else, then later, easily, restore it.

Traditional (well not all the way back to the beginning) arm syntax:

...
stmdb r13!,{r5}
...temporarily use r5 for something else...
ldmia r13!,{r5}
...

stm is store multiple you can save more than one register at a time, up to all of them in one instruction.

db means decrement before, this is a downward moving stack from high addresses to lower addresses.

You can use r13 or sp here to indicate the stack pointer. This particular instruction is not limited to stack operations, can be used for other things.

The ! means update the r13 register with the new address after it completes, here again stm can be used for non-stack operations so you might not want to change the base address register, leave the ! off in that case.

Then in the brackets { } list the registers you want to save, comma separated.

ldmia is the reverse, ldm means load multiple. ia means increment after and the rest is the same as stm

So if your stack pointer were at 0x20008000 when you hit the stmdb instruction seeing as there is one 32 bit register in the list it will decrement before it uses it the value in r13 so 0x20007FFC then it writes r5 to 0x20007FFC in memory and saves the value 0x20007FFC in r13. Later, assuming you have no bugs when you get to the ldmia instruction r13 has 0x20007FFC in it there is a single register in the list r5. So it reads memory at 0x20007FFC puts that value in r5, ia means increment after so 0x20007FFC increments one register size to 0x20008000 and the ! means write that number to r13 to complete the instruction.

Why would you use the stack instead of just a fixed memory location? Well the beauty of the above is that r13 can be anywhere it could be 0x20007654 when you run that code or 0x20002000 or whatever and the code still functions, even better if you use that code in a loop or with recursion it works and for each level of recursion you go you save a new copy of r5, you might have 30 saved copies depending on where you are in that loop. and as it unrolls it puts all the copies back as desired. with a single fixed memory location that doesnt work. This translates directly to C code as an example:

void myfun ( void )
{
   int somedata;
}

In a C program like that the variable somedata lives on the stack, if you called myfun recursively you would have multiple copies of the value for somedata depending on how deep in the recursion. Also since that variable is only used within the function and is not needed elsewhere then you perhaps dont want to burn an amount of system memory for that variable for the life of the program you only want those bytes when in that function and free that memory when not in that function. that is what a stack is used for.

A global variable would not be found on the stack.

Going back...

Say you wanted to implement and call that function you would have some code/function you are in when you call the myfun function. The myfun function wants to use r5 and r6 when it is operating on something but it doesnt want to trash whatever someone called it was using r5 and r6 for so for the duration of myfun() you would want to save those registers on the stack. Likewise if you look into the branch link instruction (bl) and the link register lr (r14) there is only one link register, if you call a function from a function you will need to save the link register on each call otherwise you cant return.

...
bl myfun
    <--- the return from my fun returns here
...


myfun:
stmdb sp!,{r5,r6,lr}
sub sp,#4 <--- make room for the somedata variable
...
some code here that uses r5 and r6
bl more_fun <-- this modifies lr, if we didnt save lr we wouldnt be able to return from myfun
   <---- more_fun() returns here
...
add sp,#4 <-- take back the stack memory we allocated for the somedata variable
ldmia sp!,{r5,r6,lr}
mov pc,lr <---- return to whomever called myfun.

So hopefully you can see both the stack usage and link register. Other processors do the same kinds of things in a different way. for example some will put the return value on the stack and when you execute the return function it knows where to return to by pulling a value off of the stack. Compilers C/C++, etc will normally have a "calling convention" or application interface (ABI and EABI are names for the ones ARM has defined). if every function follows the calling convention, puts parameters it is passing to functions being called in the right registers or on the stack per the convention. And each function follows the rules as to what registers it does not have to preserve the contents of and what registers it has to preserve the contents of then you can have functions call functions call functions and do recursion and all kinds of things, so long as the stack does not go so deep that it runs into the memory used for globals and the heap and such, you can call functions and return from them all day long. The above implementation of myfun is very similar to what you would see a compiler produce.

ARM has many cores now and a few instruction sets the cortex-m series works a little differently as far as not having a bunch of modes and different stack pointers. And when executing thumb instructions in thumb mode you use the push and pop instructions which do not give you the freedom to use any register like stm it only uses r13 (sp) and you cannot save all the registers only a specific subset of them. the popular arm assemblers allow you to use

push {r5,r6}
...
pop {r5,r6}

in arm code as well as thumb code. For the arm code it encodes the proper stmdb and ldmia. (in thumb mode you also dont have the choice as to when and where you use db, decrement before, and ia, increment after).

No you absolutly do not have to use the same registers and you dont have to pair up the same number of registers.

push {r5,r6,r7}
...
pop {r2,r3}
...
pop {r1}

assuming there is no other stack pointer modifications in between those instructions if you remember the sp is going to be decremented 12 bytes for the push lets say from 0x1000 to 0x0FF4, r5 will be written to 0xFF4, r6 to 0xFF8 and r7 to 0xFFC the stack pointer will change to 0x0FF4. the first pop will take the value at 0x0FF4 and put that in r2 then the value at 0x0FF8 and put that in r3 the stack pointer gets the value 0x0FFC. later the last pop, the sp is 0x0FFC that is read and the value placed in r1, the stack pointer then gets the value 0x1000, where it started.

The ARM ARM, ARM Architectural Reference Manual (infocenter.arm.com, reference manuals, find the one for ARMv5 and download it, this is the traditional ARM ARM with ARM and thumb instructions) contains pseudo code for the ldm and stm ARM istructions for the complete picture as to how these are used. Likewise well the whole book is about the arm and how to program it. Up front the programmers model chapter walks you through all of the registers in all of the modes, etc.

If you are programming an ARM processor you should start by determining (the chip vendor should tell you, ARM does not make chips it makes cores that chip vendors put in their chips) exactly which core you have. Then go to the arm website and find the ARM ARM for that family and find the TRM (technical reference manual) for the specific core including revision if the vendor has supplied that (r2p0 means revision 2.0 (two point zero, 2p0)), even if there is a newer rev, use the manual that goes with the one the vendor used in their design. Not every core supports every instruction or mode the TRM tells you the modes and instructions supported the ARM ARM throws a blanket over the features for the whole family of processors that that core lives in. Note that the ARM7TDMI is an ARMv4 NOT an ARMv7 likewise the ARM9 is not an ARMv9. ARMvNUMBER is the family name ARM7, ARM11 without a v is the core name. The newer cores have names like Cortex and mpcore instead of the ARMNUMBER thing, which reduces confusion. Of course they had to add the confusion back by making an ARMv7-m (cortex-MNUMBER) and the ARMv7-a (Cortex-ANUMBER) which are very different families, one is for heavy loads, desktops, laptops, etc the other is for microcontrollers, clocks and blinking lights on a coffee maker and things like that. google beagleboard (Cortex-A) and the stm32 value line discovery board (Cortex-M) to get a feel for the differences. Or even the open-rd.org board which uses multiple cores at more than a gigahertz or the newer tegra 2 from nvidia, same deal super scaler, muti core, multi gigahertz. A cortex-m barely brakes the 100MHz barrier and has memory measured in kbytes although it probably runs of a battery for months if you wanted it to where a cortex-a not so much.

sorry for the very long post, hope it is useful.