Strict aliasing rule and 'char *' pointers

The accepted answer to What is the strict aliasing rule? mentions that you can use char * to alias another type but not the other way.

It doesn't make sense to me — if we have two pointers, one of type char * and another of type struct something * pointing to the same location, how is it possible that the first aliases the second but the second doesn't alias the first?


Solution 1:

if we have two pointers, one of type char * and another of type struct something * pointing to the same location, how is it possible that the first aliases the second but the second doesn't alias the first?

It does, but that's not the point.

The point is that if you have one or more struct somethings then you may use a char* to read their constituent bytes, but if you have one or more chars then you may not use a struct something* to read them.

Solution 2:

The wording in the referenced answer is slightly erroneous, so let's get that ironed out first: One object never aliases another object, but two pointers can "alias" the same object (meaning, the pointers point to the same memory location - as M.M. pointed out, this is still not 100% correct wording but you get the Idea). Also, the standard itself doesn't (to the best of my knowledge) actually talk about strict aliasing at all, but only gives rules, through which kinds of expressions a object may be accessed or not. Compiler flags like '-fno-strict-aliasing' tell the compiler whether it can assume the programmer followed those rules (so it can perform optimizations based on that assumption) or not.

Now to your question: Any object can be accessed through a pointer to char, but a char object (especially a char array) may not be accessed through most other pointer types. Based on that the compiler can/must make the following assumptions:

  1. If the type of the actual object itself is not known, a char* and T* could always point to the same object (alias each other) -> symmetric relationship.
  2. If T1and T2 are not "related" and not char, then T1* and T2* may never point to the same object -> symmetric relationship
  3. A char* may point to a char OR a T object
  4. A T* may NOT point to an char object -> asymmetric relationship

I believe, the main rationale behind the asymmetric rules about accessing object through pointers is that a char array might not satisfy the alignment requirements of e.g. an int.

So, even without compiler optimizations based on the strict aliasing rule, e.g. writing an int to the location of a 4-byte char array at addresses 0x1,0x2,0x3,0x4 will - in the best case - result in poor performance and - in the worst case - access a different memory location, because the CPU instructions might ignore the lowest two address bits when writing a 4-byte value (so here this might result in a write to 0x0,0x1,0x2 and 0x3).

Please also be aware that the meaning of "related" differs from language to language (between C and C++), but that is not relevant for your question.

Solution 3:

if we have two pointers, one of type char * and another of type struct something * pointing to the same location, how is it possible that the first aliases the second but the second doesn't alias the first?

Pointers don't alias each other; that's sloppy use of language. Aliasing is when an lvalue is used to access an object of a different type. (Dereferencing a pointer gives an lvalue).

In your example, what's important is the type of the object being aliased. For a concrete example let's say that the object is a double. Accessing the double by dereferencing a char * pointing at the double is fine because the strict aliasing rule permits this. However, accessing a double by dereferencing a struct something * is not permitted (unless, arguably, the struct starts with double!).

If the compiler is looking at a function which takes char * and struct something *, and it does not have available the information about the object being pointed to (this is actually unlikely as aliasing passes are done at a whole-program optimization stage); then it would have to allow for the possibility that the object might actually be a struct something *, so no optimization could be done inside this function.