How does dereferencing of a function pointer happen?
It's not quite the right question. For C, at least, the right question is
What happens to a function value in an rvalue context?
(An rvalue context is anywhere a name or other reference appears where it should be used as a value, rather than a location — basically anywhere except on the left-hand side of an assignment. The name itself comes from the right-hand side of an assignment.)
OK, so what happens to a function value in an rvalue context? It is immediately and implicitly converted to a pointer to the original function value. If you dereference that pointer with *
, you get the same function value back again, which is immediately and implicitly converted into a pointer. And you can do this as many times as you like.
Two similar experiments you can try:
-
What happens if you dereference a function pointer in an lvalue context—the left-hand side of an assignment. (The answer will be about what you expect, if you keep in mind that functions are immutable.)
-
An array value is also converted to a pointer in an lvalue context, but it is converted to a pointer to the element type, not to a pointer to the array. Dereferencing it will therefore give you an element, not an array, and the madness you show doesn't occur.
Hope this helps.
P.S. As to why a function value is implicitly converted to a pointer, the answer is that for those of us who use function pointers, it's a great convenience not to have to use &
's everywhere. There's a dual convenience as well: a function pointer in call position is automatically converted to a function value, so you don't have to write *
to call through a function pointer.
P.P.S. Unlike C functions, C++ functions can be overloaded, and I'm not qualified to comment on how the semantics works in C++.
C++03 §4.3/1:
An lvalue of function type T can be converted to an rvalue of type “pointer to T.” The result is a pointer to the function.
If you attempt an invalid operation on a function reference, such as the unary *
operator, the first thing the language tries is a standard conversion. It's just like converting an int
when adding it to a float
. Using *
on a function reference causes the language to take its pointer instead, which in your example, is square 1.
Another case where this applies is when assigning a function pointer.
void f() {
void (*recurse)() = f; // "f" is a reference; implicitly convert to ptr.
recurse(); // call operator is defined for pointers
}
Note that this doesn't work the other way.
void f() {
void (&recurse)() = &f; // "&f" is a pointer; ERROR can't convert to ref.
recurse(); // OK - call operator is *separately* defined for references
}
Function reference variables are nice because they (in theory, I've never tested) hint to the compiler that an indirect branch may be unnecessary, if initialized in an enclosing scope.
In C99, dereferencing a function pointer yields a function designator. §6.3.2.1/4:
A function designator is an expression that has function type. Except when it is the operand of the sizeof operator or the unary & operator, a function designator with type ‘‘function returning type’’ is converted to an expression that has type ‘‘pointer to function returning type’’.
This is more like Norman's answer, but notably C99 has no concept of rvalues.
Put yourself in the shoes of the compiler writer. A function pointer has a well defined meaning, it is a pointer to a blob of bytes that represent machine code.
What do you do when the programmer dereferences a function pointer? Do you take the first (or 8) bytes of the machine code and reinterpret that as a pointer? Odds are about 2 billion to one that this won't work. Do you declare UB? Plenty of that going around already. Or do you just ignore the attempt? You know the answer.
How exactly does dereferencing of a function pointer work?
Two steps. The first step is at compile time, the second at runtime.
In step one, the compiler sees it has a pointer and a context in which that pointer is dereferenced (such as (*pFoo)()
) so it generates code for that situation, code that will be used in step 2.
In step 2, at runtime the code is executed. The pointer contains some bytes indicating which function should be executed next. These bytes are somehow loaded into the CPU. A common case is a CPU with an explicit CALL [register]
instruction. On such systems, a function pointer can be simply the address of a function in memory, and the derefencing code does nothing more than loading that address into a register followed by a CALL [register]
instruction.
It happens with a few implicit conversions. Indeed, per the C standard:
ISO/IEC 2011, section 6.3.2.1 Lvalues, arrays, and function designators, paragraph 4
A function designator is an expression that has function type. Except when it is the operand of the
sizeof
operator or the unary&
operator, a function designator with type “function returning type” is converted to an expression that has type “pointer to function returning type”.
Consider the following code:
void func(void);
int main(void)
{
void (*ptr)(void) = func;
return 0;
}
Here, the function designator func
has the type “function returning void
” but is immediately converted to an expression that has type “pointer to function returning void
”. However, if you write
void (*ptr)(void) = &func;
then the function designator func
has the type “function returning void
” but the unary &
operator explicitly take the address of that function, eventually yielding the type “pointer to function returning void
”.
This is mentioned in the C standard:
ISO/IEC 2011, section 6.5.3.2 Address and indirection operators, paragraph 3
The unary
&
operator yields the address of its operand. If the operand has type “type”, the result has type “pointer to type”.
In particular, dereferencing a function pointer is redundant. Per the C standard:
ISO/IEC 2011, section 6.5.2.2 Function calls, paragraph 1
The expression that denotes the called function shall have type “pointer to function returning
void
” or returning a complete object type other than an array type. Most often, this is the result of converting an identifier that is a function designator.ISO/IEC 2011, section 6.5.3.2 Address and indirection operators, paragraph 4
The unary
*
operator denotes indirection. If the operand points to a function, the result is a function designator.
So when you write
ptr();
the function call is evaluated with no implicit conversion because ptr
is already a pointer to function. If you explicitly dereference it with
(*ptr)();
then the dereferencing yields the type “function returning void
” which is immediately converted back to the type “pointer to function returning void
” and the function call occurs. When writing an expression composed of x unary *
indirection operators such as
(****ptr)();
then you just repeat the implicit conversions x times.
It does make sense that calling functions involves function pointers. Before executing a function, a program pushes all of the parameters for the function onto the stack in the reverse order that they are documented. Then the program issues a call
instruction indicating which function it wishes to start. The call
instruction does two things:
- First it pushes the address of the next instruction, which is the return address, onto the stack.
- Then, it modifies the instruction pointer
%eip
to point to the start of the function.
Since calling a function does involve modifying an instruction pointer, which is a memory address, it makes sense that the compiler implicitly converts a function designator to a pointer to function.
Even though it may seems unrigorous to have these implicit conversions, it can be useful in C (unlike C++ which have namespaces) to take advantage of the namespace defined by a struct identifier to encapsulate variables.
Consider the following code:
void create_person(void);
void update_person(void);
void delete_person(void);
struct Person {
void (*create)(void);
void (*update)(void);
void (*delete)(void);
};
static struct Person person = {
.create = &create_person,
.update = &update_person,
.delete = &delete_person,
};
int main(void)
{
person.create();
person.update();
person.delete();
return 0;
}
It is possible to hide the implementation of the library in other translation units and to choose to only expose the struct encapsulating the pointers to functions, to use them in place of the actual function designators.