The spiral rule about declarations — when is it in error?

c declaration

Basically speaking, the rule simply doesn't work, or else it works by redefining what is meant by spiral (in which case, there's no point in it. Consider, for example:

int* a[10][15];

The spiral rule would give a is an array[10] of pointer to array[15] of int, which is wrong. It the case you cite, it doesn't work either; in fact, in the case of signal, it's not even clear where you should start the spiral.

In general, it's easier to find examples of where the rule fails than examples where it works.

I'm often tempted to say that parsing a C++ declaration is simple, but nobody who has tried with complicated declarations would believe me. On the other hand, it's not as hard as it is sometimes made out to be. The secret is to think of the declaration exactly as you would an expression, but with a lot less operators, and a very simple precedence rule: all operators to the right have precedence over all operators to the left. In the absence of parentheses, this means process everything to the right first, then everything to the left, and process parentheses exactly as you would in any other expression. The actual difficulty is not the syntax per se, but that it results is some very complex and counterintuitive declarations, in particular where function return values and pointers to functions are involved: the first right, then left rule means that operators at a particular level are often widely separated, e.g.:

int (*f( /* lots of parameters */ ))[10];

The final term in the expansion here is int[10], but putting the [10] after the complete function specification is (at least to me) very unnatural, and I have to stop and work it out each time. (It's probably this tendency for logically adjacent parts to spread out that lead to the spiral rule. The problem is, of course, that in the absence of parentheses, they don't always spread out—anytime you see [i][j], the rule is go right, then go right again, rather than spiral.)

And since we're now thinking of declarations in terms of expressions: what do you do when an expression becomes too complicated to read? You introduce intermediate variables in order to make it easier to read. In the case of declarations, the "intermediate variables" are typedef. In particular, I would argue that any time part of the return type ends up after the function arguments (and a lot of other times as well), you should use a typedef to make the declaration simpler. (This is a "do as I say, not as I do" rule, however. I'm afraid that I'll occasionally use some very complex declarations.)

The rule is correct. However, one should be very careful in applying it.

I suggest to apply it in a more formal way for C99+ declarations.

The most important thing here is to recognize the following recursive structure of all declarations (const, volatile, static, extern, inline, struct, union, typedef are removed from the picture for simplicity but can be added back easily):

base-type [derived-part1: *'s] [object] [derived-part2: []'s or ()]

Yep, that's it, four parts.

where

  base-type is one of the following (I'm using a bit compressed notation):
    void
    [signed/unsigned] char
    [signed/unsigned] short [int]
    signed/unsigned [int]
    [signed/unsigned] long [long] [int]
    float
    [long] double
    etc

  object is
      an identifier
    OR
      ([derived-part1: *'s] [object] [derived-part2: []'s or ()])

  * is *, denotes a reference/pointer and can be repeated
  [] in derived-part2 denotes bracketed array dimensions and can be repeated
  () in derived-part2 denotes parenthesized function parameters delimited with ,'s
  [] elsewhere denotes an optional part
  () elsewhere denotes parentheses

Once you've got all 4 parts parsed,

[object] is [derived-part2 (containing/returning)] [derived-part2 (pointer to)] base-type ¹.

If there's recursion, you find your object (if there's any) at the bottom of the recursion stack, it'll be the inner-most one and you'll get the full declaration by going back up and collecting and combining derived parts at each level of recursion.

While parsing you may move [object] to after [derived-part2] (if any). This will give you a linearized, easy to understand, declaration (see ¹ above).

Thus, in

char* (**(*foo[3][5])(void))[7][9];

you get:

base-type = char
level 1: derived-part1 = *, object = (**(*foo[3][5])(void)), derived-part2 = [7][9]
level 2: derived-part1 = **, object = (*foo[3][5]), derived-part2 = (void)
level 3: derived-part1 = *, object = foo, derived-part2 = [3][5]

From there:

level 3: * [3][5] foo
level 2: ** (void) * [3][5] foo
level 1: * [7][9] ** (void) * [3][5] foo
finally, char * [7][9] ** (void) * [3][5] foo

Now, reading right to left:

foo is an array of 3 arrays of 5 pointers to a function (taking no params) returning a pointer to a pointer to an array of 7 arrays of 9 pointers to a char.

You could reverse the array dimensions in every derived-part2 in the process as well.

That's your spiral rule.

And it's easy to see the spiral. You dive into the ever more deeply nested [object] from the left and then resurface on the right only to note that on the upper level there's another pair of left and right and so on.

The spiral rule is actually an over-complicated way of looking at it. The actual rule is much simpler:

postfix is higher precedence than prefix.

That's it. That's all you need to remember. The 'complex' cases are when you have parenthesis to override that postfix-higher-than-prefix precedence, but you really just need to find the matching parenthesis, then look at the things inside the parens first, and, if that is not complete, pull in the next level outside the parenthses, postfix first.

So looking at your complex example

void (*signal(int, void (*fp)(int)))(int);

we can start at any name and figure out what that name is. If you start at int, you're done -- int is a type and you can understand it by itself.

If you start at fp, fp is not a type, its a name being declared as something. So look at the first set of parens enclosing:

                        (*fp)

there's no suffix (deal with postfix first), then the prefix * means pointer. Pointer to what? not complete yet so look out another level

                   void (*fp)(int)

The suffix first is "function taking an int param", then the prefix is "returning void". So we have fn is "pointer to function taking int param, returning void"

If we start a signal, the first level has a suffix (function) and a prefix (returning pointer). Need the next level out to see what that points to (function returning void). So we end up with "function with two params (int and pointer to function), returning pointer to function with one (int) param, returning void"

The spiral rule about declarations — when is it in error?

Related

Recent Posts