How does this piece of code determine array size without using sizeof( )?
Going through some C interview questions, I've found a question stating "How to find the size of an array in C without using the sizeof operator?", with the following solution. It works, but I cannot understand why.
#include <stdio.h>
int main() {
int a[] = {100, 200, 300, 400, 500};
int size = 0;
size = *(&a + 1) - a;
printf("%d\n", size);
return 0;
}
As expected, it returns 5.
edit: people pointed out this answer, but the syntax does differ a bit, i.e. the indexing method
size = (&arr)[1] - arr;
so I believe both questions are valid and have a slightly different approach to the problem. Thank you all for the immense help and thorough explanation!
When you add 1 to a pointer, the result is the location of the next object in a sequence of objects of the pointed-to type (i.e., an array). If p
points to an int
object, then p + 1
will point to the next int
in a sequence. If p
points to a 5-element array of int
(in this case, the expression &a
), then p + 1
will point to the next 5-element array of int
in a sequence.
Subtracting two pointers (provided they both point into the same array object, or one is pointing one past the last element of the array) yields the number of objects (array elements) between those two pointers.
The expression &a
yields the address of a
, and has the type int (*)[5]
(pointer to 5-element array of int
). The expression &a + 1
yields the address of the next 5-element array of int
following a
, and also has the type int (*)[5]
. The expression *(&a + 1)
dereferences the result of &a + 1
, such that it yields the address of the first int
following the last element of a
, and has type int [5]
, which in this context "decays" to an expression of type int *
.
Similarly, the expression a
"decays" to a pointer to the first element of the array and has type int *
.
A picture may help:
int [5] int (*)[5] int int *
+---+ +---+
| | <- &a | | <- a
| - | +---+
| | | | <- a + 1
| - | +---+
| | | |
| - | +---+
| | | |
| - | +---+
| | | |
+---+ +---+
| | <- &a + 1 | | <- *(&a + 1)
| - | +---+
| | | |
| - | +---+
| | | |
| - | +---+
| | | |
| - | +---+
| | | |
+---+ +---+
This is two views of the same storage - on the left, we're viewing it as a sequence of 5-element arrays of int
, while on the right, we're viewing it as a sequence of int
. I also show the various expressions and their types.
Be aware, the expression *(&a + 1)
results in undefined behavior:
...
If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
C 2011 Online Draft, 6.5.6/9
This line is of most importance:
size = *(&a + 1) - a;
As you can see, it first takes the address of a
and adds one to it. Then, it dereferences that pointer and subtracts the original value of a
from it.
Pointer arithmetic in C causes this to return the number of elements in the array, or 5
. Adding one and &a
is a pointer to the next array of 5 int
s after a
. After that, this code dereferences the resulting pointer and subtracts a
(an array type that has decayed to a pointer) from that, giving the number of elements in the array.
Details on how pointer arithmetic works:
Say you have a pointer xyz
that points to an int
type and contains the value (int *)160
. When you subtract any number from xyz
, C specifies that the actual amount subtracted from xyz
is that number times the size of the type that it points to. For example, if you subtracted 5
from xyz
, the value of xyz
resulting would be xyz - (sizeof(*xyz) * 5)
if pointer arithmetic didn't apply.
As a
is an array of 5
int
types, the resulting value will be 5. However, this will not work with a pointer, only with an array. If you try this with a pointer, the result will always be 1
.
Here's a little example that shows the addresses and how this is undefined. The the left-hand side shows the addresses:
a + 0 | [a[0]] | &a points to this
a + 1 | [a[1]]
a + 2 | [a[2]]
a + 3 | [a[3]]
a + 4 | [a[4]] | end of array
a + 5 | [a[5]] | &a+1 points to this; accessing past array when dereferenced
This means that the code is subtracting a
from &a[5]
(or a+5
), giving 5
.
Note that this is undefined behavior, and should not be used under any circumstances. Do not expect the behavior of this to be consistent across all platforms, and do not use it in production programs.
Hmm, I suspect this is something that would not have worked back in the early days of C. It is clever though.
Taking the steps one at a time:
-
&a
gets a pointer to an object of type int[5] -
+1
gets the next such object assuming there is an array of those -
*
effectively converts that address into type pointer to int -
-a
subtracts the two int pointers, returning the count of int instances between them.
I'm not sure it is completely legal (in this I mean language-lawyer legal - not will it work in practice), given some of the type operations going on. For example you are only "allowed" to subtract two pointers when they point to elements in the same array. *(&a+1)
was synthesised by accessing another array, albeit a parent array, so is not actually a pointer into the same array as a
.
Also, while you are allowed to synthesise a pointer past the last element of an array, and you can treat any object as an array of 1 element, the operation of dereferencing (*
) is not "allowed" on this synthesised pointer, even though it has no behaviour in this case!
I suspect that in the early days of C (K&R syntax, anyone?), an array decayed into a pointer much more quickly, so the *(&a+1)
might only return the address of the next pointer of type int**. The more rigorous definitions of modern C++ definitely allow the pointer to array type to exist and know the array size, and probably the C standards have followed suit. All C function code only takes pointers as arguments, so the technical visible difference is minimal. But I am only guessing here.
This sort of detailed legality question usually applies to a C interpreter, or a lint type tool, rather than the compiled code. An interpretter might implement a 2D array as an array of pointers to arrays, because there is one less runtime feature to implement, in which case dereferencing the +1 would be fatal, and even if it worked would give the wrong answer.
Another possible weakness may be that the C compiler might align the outer array. Imagine if this was an array of 5 chars (char arr[5]
), when the program performs &a+1
it is invoking "array of array" behaviour. The compiler might decide that an array of array of 5 chars (char arr[][5]
) is actually generated as an array of array of 8 chars (char arr[][8]
), so that the outer array aligns nicely. The code we are discussing would now report the array size as 8, not 5. I'm not saying a particular compiler would definitely do this, but it might.