How to determine maximum stack usage?

Solution 1:

The most common way to determine the deepest stack usage is to initialize the stack memory with some known but unusual value, then periodically (or at the end of a big test run) see where that pattern stops.

This is exactly how the IAR IDE determines the amount of stack used.

Solution 2:

You tagged your question with static-analysis, but this is a problem that is difficult to solve through static-analysis. The stack usage depends on the program's runtime profile, especially, if you're using recursion or alloca. Given that this is an embedded platform I guess it's also difficult to run something like ps or top and see how much stack your application is using.

An interesting approach is to use the address of the current stack frame in order to determine how much stack is used. You can do this by taking the address of a function's argument or local variable. Do that for the main function and for functions you think are using the most stack. The difference will tell you the amount of stack your application requires. Here is an example (assuming customary high-to-low stack growth).

char *stack_top, stack_bottom;

int
main(int argc, char *argv[])
{
    stack_top = (char *)&argc;
    // ...
    printf("Stack usage: %d\n", stack_top - stack_bottom);
}

void
deeply_nested_function(void)
{
    int a;
    stack_bottom = (char *)&a;
    // ...
}

If your compiler allows you to specify a custom function prologue (many do it to allow graph-based program profiling), you can even arrange for all functions to call such measuring code. Then your measurement function mecomes something like

void
stack_measurement_function(void)
{
    int a;
    stack_bottom = min(stack_bottom, (char *)&a);
    // ...
}

I used an approach similar to what I've described to generate these charts.

Solution 3:

With a good source code static analysis tool, you can produce a call graph for your application. Given that, and estimates of the amount of locals/temporaries produced by your compiler, you can straightforwardly compute a conservative estimate of the stack demand.

What I mean by "good" analysis tool is one that can read all the compilation units involved, can determine direct function calls, can determine indirect pointers, in the compilaiton unit, can compute a conservative points-to analysis across the entire system, can construct a call graph taking into account the points-to analysis. This eliminates a lot of tools, which is why one sees ad hoc methods such as "fill the stack at runtime and see what happens". You also need estimates of the stack demands the compiler places on the stack; you can approximate a lot of this by simply knowing how big the storage demands of all the types are, which is generally fairly easy to determine for embedded systems C programs. Finally, you need to believe that you applicaiton doesn't have recursive calls, or that the tool has an idea of the deepest recursion (probably by you telling it).

The DMS Software Reengineering Toolkit meets all of these requirements for C programs. See http://www.semanticdesigns.com/Products/DMS/DMSToolkit.html You still have to configure it to compute the stack demand by crawling the call graph and using the various size estimates.

If you want a fast answer, use the stack-fill trick. If you want an answer that you can recompute after each source code change you'll need the static analysis approach.

Solution 4:

I am working on this problem right now - i.e. analytical computation of stacksize. It is clearly going to be a highly recursive piece of code, because a function call could have an indexed array as one or more of its arguments and one or more of the array indices could involve function calls!!

However, a couple of realizations allow some relief from the complexity:

(1) When using a high-level language compiler, the stackpointer at the end of execution of each statement/line-of-code should be at the same place as it was at the start. (At least this would be a good rule to observe otherwise you are going to have problems!)

(2)The stackpointer after returning from each function or subroutine call should be the same as it was pre-call. Therefore the maximum stack size is the maximum, over all statements in the program, of the peak stacksize reached in each statement. (At least this would be a good rule to observe otherwise you are going to have problems!)

Of course a statement can include the recursive issues I mentioned above, but at least the problem of finding the maximum stacksize requirement of a whole program then boils down to finding the maximum stacksize requirement of each statement, and then picking the maximum of those.

This cannot be completed until all functions called have also been compiled. So I generate a file for each module compiled that records a number of stacksizes for each statement (basically the peak value prior to each function call and the value immmediately prior to each function call (excluding any as yet unknown additions to stacksize caused by the function call), and the function names involved. Then I retrospectively process these files using a recursive routine, once all functions have been compiled, to determine peak stacksize.

The fortunate thing is that, recursive routines apart, the maximum possible stacksize requirement does not then depend on program flow, although in a typical flow (which is data dependent) this maximum possible stacksize may never be reached.

Example: Suppose function 1 calls function 2 and the program flow of both depends on a data value X. Suppose there is a range of X which causes function 1 to execute its worst statement, which includes a call to function 2, which does not execute its worst case statement for that same range of X. Since we computed the maximum possible stacksize by using the worst cases for both function 1 and function 2 simultaneously, we may have overestimated the stacksize. At least we erred on the safe side.

I like to give interrupt routines their own stackspace on an OS stack if they need any, so they do not add to program stack requirements, apart from the return-from-interrupt address