The spiral rule about declarations — when is it in error?
Basically speaking, the rule simply doesn't work, or else it works by redefining what is meant by spiral (in which case, there's no point in it. Consider, for example:
int* a[10][15];
The spiral rule would give a is an array[10] of pointer to
array[15] of int, which is wrong. It the case you cite, it
doesn't work either; in fact, in the case of signal
, it's not
even clear where you should start the spiral.
In general, it's easier to find examples of where the rule fails than examples where it works.
I'm often tempted to say that parsing a C++ declaration is simple, but nobody who has tried with complicated declarations would believe me. On the other hand, it's not as hard as it is sometimes made out to be. The secret is to think of the declaration exactly as you would an expression, but with a lot less operators, and a very simple precedence rule: all operators to the right have precedence over all operators to the left. In the absence of parentheses, this means process everything to the right first, then everything to the left, and process parentheses exactly as you would in any other expression. The actual difficulty is not the syntax per se, but that it results is some very complex and counterintuitive declarations, in particular where function return values and pointers to functions are involved: the first right, then left rule means that operators at a particular level are often widely separated, e.g.:
int (*f( /* lots of parameters */ ))[10];
The final term in the expansion here is int[10]
, but putting
the [10]
after the complete function specification is (at
least to me) very unnatural, and I have to stop and work it out
each time. (It's probably this tendency for logically adjacent
parts to spread out that lead to the spiral rule. The problem
is, of course, that in the absence of parentheses, they don't
always spread out—anytime you see [i][j]
, the rule is go
right, then go right again, rather than spiral.)
And since we're now thinking of declarations in terms of
expressions: what do you do when an expression becomes too
complicated to read? You introduce intermediate variables in order
to make it easier to read. In the case of declarations, the
"intermediate variables" are typedef
. In particular, I would
argue that any time part of the return type ends up after the
function arguments (and a lot of other times as well), you
should use a typedef
to make the declaration simpler. (This
is a "do as I say, not as I do" rule, however. I'm afraid that
I'll occasionally use some very complex declarations.)
The rule is correct. However, one should be very careful in applying it.
I suggest to apply it in a more formal way for C99+ declarations.
The most important thing here is to recognize the following recursive structure of all declarations (const
, volatile
, static
, extern
, inline
, struct
, union
, typedef
are removed from the picture for simplicity but can be added back easily):
base-type [derived-part1: *'s] [object] [derived-part2: []'s or ()]
Yep, that's it, four parts.
where
base-type is one of the following (I'm using a bit compressed notation):
void
[signed/unsigned] char
[signed/unsigned] short [int]
signed/unsigned [int]
[signed/unsigned] long [long] [int]
float
[long] double
etc
object is
an identifier
OR
([derived-part1: *'s] [object] [derived-part2: []'s or ()])
* is *, denotes a reference/pointer and can be repeated
[] in derived-part2 denotes bracketed array dimensions and can be repeated
() in derived-part2 denotes parenthesized function parameters delimited with ,'s
[] elsewhere denotes an optional part
() elsewhere denotes parentheses
Once you've got all 4 parts parsed,
[object
] is [derived-part2
(containing/returning)] [derived-part2
(pointer to)] base-type
1.
If there's recursion, you find your object
(if there's any) at the bottom of the recursion stack, it'll be the inner-most one and you'll get the full declaration by going back up and collecting and combining derived parts at each level of recursion.
While parsing you may move [object]
to after [derived-part2]
(if any). This will give you a linearized, easy to understand, declaration (see 1 above).
Thus, in
char* (**(*foo[3][5])(void))[7][9];
you get:
-
base-type
=char
- level 1:
derived-part1
=*
,object
=(**(*foo[3][5])(void))
,derived-part2
=[7][9]
- level 2:
derived-part1
=**
,object
=(*foo[3][5])
,derived-part2
=(void)
- level 3:
derived-part1
=*
,object
=foo
,derived-part2
=[3][5]
From there:
- level 3:
*
[3][5]
foo
- level 2:
**
(void)
*
[3][5]
foo
- level 1:
*
[7][9]
**
(void)
*
[3][5]
foo
- finally,
char
*
[7][9]
**
(void)
*
[3][5]
foo
Now, reading right to left:
foo
is an array of 3 arrays of 5 pointers to a function (taking no params) returning a pointer to a pointer to an array of 7 arrays of 9 pointers to a char.
You could reverse the array dimensions in every derived-part2
in the process as well.
That's your spiral rule.
And it's easy to see the spiral. You dive into the ever more deeply nested [object]
from the left and then resurface on the right only to note that on the upper level there's another pair of left and right and so on.
The spiral rule is actually an over-complicated way of looking at it. The actual rule is much simpler:
postfix is higher precedence than prefix.
That's it. That's all you need to remember. The 'complex' cases are when you have parenthesis to override that postfix-higher-than-prefix precedence, but you really just need to find the matching parenthesis, then look at the things inside the parens first, and, if that is not complete, pull in the next level outside the parenthses, postfix first.
So looking at your complex example
void (*signal(int, void (*fp)(int)))(int);
we can start at any name and figure out what that name is. If you start at int
, you're done -- int
is a type and you can understand it by itself.
If you start at fp
, fp is not a type, its a name being declared as something. So look at the first set of parens enclosing:
(*fp)
there's no suffix (deal with postfix first), then the prefix *
means pointer. Pointer to what? not complete yet so look out another level
void (*fp)(int)
The suffix first is "function taking an int param", then the prefix is "returning void". So we have fn
is "pointer to function taking int param, returning void"
If we start a signal
, the first level has a suffix (function) and a prefix (returning pointer). Need the next level out to see what that points to (function returning void). So we end up with "function with two params (int and pointer to function), returning pointer to function with one (int) param, returning void"