Why is the dereference operator (*) also used to declare a pointer?
I'm not sure if this is a proper programming question, but it's something that has always bothered me, and I wonder if I'm the only one.
When initially learning C++, I understood the concept of references, but pointers had me confused. Why, you ask? Because of how you declare a pointer.
Consider the following:
void foo(int* bar)
{
}
int main()
{
int x = 5;
int* y = NULL;
y = &x;
*y = 15;
foo(y);
}
The function foo(int*)
takes an int
pointer as parameter. Since I've declared y
as int
pointer, I can pass y
to foo
, but when first learning C++ I associated the *
symbol with dereferencing, as such I figured a dereferenced int
needed to be passed. I would try to pass *y
into foo
, which obviously doesn't work.
Wouldn't it have been easier to have a separate operator for declaring a pointer? (or for dereferencing). For example:
void test(int@ x)
{
}
Solution 1:
In The Development of the C Language, Dennis Ritchie explains his reasoning thusly:
The second innovation that most clearly distinguishes C from its predecessors is this fuller type structure and especially its expression in the syntax of declarations... given an object of any type, it should be possible to describe a new object that gathers several into an array, yields it from a function, or is a pointer to it.... [This] led to a declaration syntax for names mirroring that of the expression syntax in which the names typically appear. Thus,
int i, *pi, **ppi;
declare an integer, a pointer to an integer, a pointer to a pointer to an integer. The syntax of these declarations reflects the observation thati, *pi, and **ppi
all yield anint
type when used in an expression.Similarly,
int f(), *f(), (*f)();
declare a function returning an integer, a function returning a pointer to an integer, a pointer to a function returning an integer.int *api[10], (*pai)[10];
declare an array of pointers to integers, and a pointer to an array of integers.In all these cases the declaration of a variable resembles its usage in an expression whose type is the one named at the head of the declaration.
An accident of syntax contributed to the perceived complexity of the language. The indirection operator, spelled * in C, is syntactically a unary prefix operator, just as in BCPL and B. This works well in simple expressions, but in more complex cases, parentheses are required to direct the parsing. For example, to distinguish indirection through the value returned by a function from calling a function designated by a pointer, one writes
*fp() and (*pf)()
respectively. The style used in expressions carries through to declarations, so the names might be declared
int *fp(); int (*pf)();
In more ornate but still realistic cases, things become worse:
int *(*pfp)();
is a pointer to a function returning a pointer to an integer.There are two effects occurring. Most important, C has a relatively rich set of ways of describing types (compared, say, with Pascal). Declarations in languages as expressive as C—Algol 68, for example—describe objects equally hard to understand, simply because the objects themselves are complex. A second effect owes to details of the syntax. Declarations in C must be read in an `inside-out' style that many find difficult to grasp. Sethi [Sethi 81] observed that many of the nested declarations and expressions would become simpler if the indirection operator had been taken as a postfix operator instead of prefix, but by then it was too late to change.
Solution 2:
The reason is clearer if you write it like this:
int x, *y;
That is, both x and *y are ints. Thus y is an int *.
Solution 3:
That is a language decision that predates C++, as C++ inherited it from C. I once heard that the motivation was that the declaration and the use would be equivalent, that is, given a declaration int *p;
the expression *p
is of type int
in the same way that with int i;
the expression i
is of type int
.
Solution 4:
Because the committee, and those that developed C++ in the decades before its standardisation, decided that *
should retain its original three meanings:
- A pointer type
- The dereference operator
- Multiplication
You're right to suggest that the multiple meanings of *
(and, similarly, &
) are confusing. I've been of the opinion for some years that it they are a significant barrier to understanding for language newcomers.
Why not choose another symbol for C++?
Backwards-compatibility is the root cause... best to re-use existing symbols in a new context than to break C programs by translating previously-not-operators into new meanings.
Why not choose another symbol for C?
It's impossible to know for sure, but there are several arguments that can be — and have been — made. Foremost is the idea that:
when [an] identifier appears in an expression of the same form as the declarator, it yields an object of the specified type. {K&R, p216}
This is also why C programmers tend to[citation needed] prefer aligning their asterisks to the right rather than to the left, i.e.:
int *ptr1; // roughly C-style
int* ptr2; // roughly C++-style
though both varieties are found in programs of both languages, varyingly.