In a structure, is it legal to use one array field to access another one?

Solution 1:

Would it be legal to write s.a[6] and expect it to be equal to s.b[2]?

No. Because accessing an array out of bound invoked undefined behaviour in C and C++.

C11 J.2 Undefined behavior

  • Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that points just beyond the array object and is used as the operand of a unary * operator that is evaluated (6.5.6).

  • An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).

C++ standard draft section 5.7 Additive operators paragraph 5 says:

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. [...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

Solution 2:

Apart from the answer of @rsp (Undefined behavior for an array subscript that is out of range) I can add that it is not legal to access b via a because the C language does not specify how much padding space can be between the end of area allocated for a and the start of b, so even if you can run it on a particular implementation , it is not portable.

instance of struct:
+-----------+----------------+-----------+---------------+
|  array a  |  maybe padding |  array b  | maybe padding |
+-----------+----------------+-----------+---------------+

The second padding may miss as well as the alignment of struct object is the alignment of a which is the same as the alignment of b but the C language also does not impose the second padding not to be there.

Solution 3:

a and b are two different arrays, and a is defined as containing 4 elements. Hence, a[6] accesses the array out of bounds and is therefore undefined behaviour. Note that array subscript a[6] is defined as *(a+6), so the proof of UB is actually given by section "Additive operators" in conjunction with pointers". See the following section of the C11-standard (e.g. this online draft version) describing this aspect:

6.5.6 Additive operators

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i-n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

The same argument applies to C++ (though not quoted here).

Further, though it is clearly undefined behaviour due to the fact of exceeding array bounds of a, note that the compiler might introduce padding between members a and b, such that - even if such pointer arithmetics were allowed - a+6 would not necessarily yield the same address as b+2.

Solution 4:

Is it legal? No. As others mentioned, it invokes Undefined Behavior.

Will it work? That depends on your compiler. That's the thing about undefined behavior: it's undefined.

On many C and C++ compilers, the struct will be laid out such that b will immediately follow a in memory and there will be no bounds checking. So accessing a[6] will effectively be the same as b[2] and will not cause any sort of exception.

Given

struct S {
  int a[4];
  int b[4];
} s

and assuming no extra padding, the structure is really just a way of looking at a block of memory containing 8 integers. You could cast it to (int*) and ((int*)s)[6] would point to the same memory as s.b[2].

Should you rely on this sort of behavior? Absolutely not. Undefined means that the compiler doesn't have to support this. The compiler is free to pad the structure which could render the assumption that &(s.b[2]) == &(s.a[6]) incorrect. The compiler could also add bounds checking on the array access (although enabling compiler optimizations would probably disable such a check).

I've have experienced the effects of this in the past. It's quite common to have a struct like this

struct Bob {
    char name[16];
    char whatever[64];
} bob;
strcpy(bob.name, "some name longer than 16 characters");

Now bob.whatever will be " than 16 characters". (which is why you should always use strncpy, BTW)

Solution 5:

As @MartinJames mentioned in a comment, if you need to guarantee that a and b are in contiguous memory (or at least able to be treated as such, (edit) unless your architecture/compiler uses an unusual memory block size/offset and forced alignment that would require padding to be added), you need to use a union.

union overlap {
    char all[8]; /* all the bytes in sequence */
    struct { /* (anonymous struct so its members can be accessed directly) */
        char a[4]; /* padding may be added after this if the alignment is not a sub-factor of 4 */
        char b[4];
    };
};

You can't directly access b from a (e.g. a[6], like you asked), but you can access the elements of both a and b by using all (e.g. all[6] refers to the same memory location as b[2]).

(Edit: You could replace 8 and 4 in the code above with 2*sizeof(int) and sizeof(int), respectively, to be more likely to match the architecture's alignment, especially if the code needs to be more portable, but then you have to be careful to avoid making any assumptions about how many bytes are in a, b, or all. However, this will work on what are probably the most common (1-, 2-, and 4-byte) memory alignments.)

Here is a simple example:

#include <stdio.h>

union overlap {
    char all[2*sizeof(int)]; /* all the bytes in sequence */
    struct { /* anonymous struct so its members can be accessed directly */
        char a[sizeof(int)]; /* low word */
        char b[sizeof(int)]; /* high word */
    };
};

int main()
{
    union overlap testing;
    testing.a[0] = 'a';
    testing.a[1] = 'b';
    testing.a[2] = 'c';
    testing.a[3] = '\0'; /* null terminator */
    testing.b[0] = 'e';
    testing.b[1] = 'f';
    testing.b[2] = 'g';
    testing.b[3] = '\0'; /* null terminator */
    printf("a=%s\n",testing.a); /* output: a=abc */
    printf("b=%s\n",testing.b); /* output: b=efg */
    printf("all=%s\n",testing.all); /* output: all=abc */

    testing.a[3] = 'd'; /* makes printf keep reading past the end of a */
    printf("a=%s\n",testing.a); /* output: a=abcdefg */
    printf("b=%s\n",testing.b); /* output: b=efg */
    printf("all=%s\n",testing.all); /* output: all=abcdefg */

    return 0;
}