Can an equality comparison of unrelated pointers evaluate to true?

Solution 1:

Can an equality comparison of unrelated pointers evaluate to true?

Yes, but ...

int a;
int b;
printf("a precedes b: %d\n", (&a + 1) == &b);
printf("b precedes a: %d\n", (&b + 1) == &a);

There are, by my interpretation of the C standard, three possibilities:

  • a immediately precedes b
  • b immediately precedes a
  • neither a nor b immediately precedes the other (there could be a gap, or another object, between them)

I played around with this some time ago and concluded that GCC was performing an invalid optimization on the == operator for pointers, making it yield false even when the addresses are the same, so I submitted a bug report:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63611

That bug was closed as a duplicate of another report:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502

The GCC maintainers who responded to these bug reports seem to be of the opinion that adjacency of two objects need not be consistent and that the comparison of their addresses might show them to be adjacent or not, within the same run of the program. As you can see from my comments on the second Bugzilla ticket, I strongly disagree. In my opinion, without consistent behavior of the == operator, the standard's requirements for adjacent objects is meaningless, and I think we have to assume that those words are not merely decorative.

Here's a simple test program:

#include <stdio.h>
int main(void) {
    int x;
    int y;
    printf("&x = %p\n&y = %p\n", (void*)&x, (void*)&y);
    if (&y == &x + 1) {
        puts("y immediately follows x");
    }
    else if (&x == &y + 1) {
        puts("x immediately follows y");
    }
    else {
        puts("x and y are not adjacent");
    }
}

When I compile it with GCC 6.2.0, the printed addresses of x and y differ by exactly 4 bytes at all optimization levels, but I get y immediately follows x only at -O0; at -O1, -O2, and -O3 I get x and y are not adjacent. I believe this is incorrect behavior, but apparently, it's not going to be fixed.

clang 3.8.1, in my opinion, behaves correctly, showing x immediately follows y at all optimization levels. Clang previously had a problem with this; I reported it:

https://bugs.llvm.org/show_bug.cgi?id=21327

and it was corrected.

I suggest not relying on comparisons of addresses of possibly adjacent objects behaving consistently.

(Note that relational operators (<, <=, >, >=) on pointers to unrelated objects have undefined behavior, but equality operators (==, !=) are generally required to behave consistently.)

Solution 2:

int a;
int b;
printf("a precedes b: %d\n", (&a + 1) == &b);
printf("b precedes a: %d\n", (&b + 1) == &a);

is perfectly well-defined code, but probably more by luck than by judgement.

You are allowed to take the address of a scalar and set a pointer one past that address. So &a + 1 is valid, but &a + 2 is not. You are also allowed to compare the value of a pointer of the same type with the value of any other valid pointer using == and !=, although pointer arithmetic is only valid within arrays.

Your assertion that the address of a and b tells you about anything about how these are placed in memory is bunk. To be clear, you cannot "reach" b by pointer arithmetic on the address of a.

As for

struct s {
    int a;
    int b;
};

The standard guarantees that the address of the struct is the same as the address of a, but an arbitrary amount of padding is allowed to be inserted between a and b. Again, you can't reach the address of b by any pointer arithmetic on the address of a.

Solution 3:

Can an equality comparison of unrelated pointers evaluate to true?

Yes. C specifies when this is true.

Two pointers compare equal if and only if ... or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space. C11dr §6.5.9 6

To be clear: adjacent variables in code do not need to be adjacent in memory, yet can be.


The below code demonstrates that it is possible. It uses a memory dump of a int* in addition to the conventional "%p" and (void*).

Yet OP's code and output not reflect this. Given the "compare equal if and only if" part of the above spec, IMO, OP's compilation is non-compliant. Adjacent in memory variables p,q , of the same type, either &p+1 == &q or &p == &q+1 must be true.

No opinion if the objects differ in type - OP does not ask that IAC.


void print_int_ptr(const char *prefix, int *p) {
  printf("%s %p", prefix, (void *) p);
  union {
    int *ip;
    unsigned char uc[sizeof (int*)];
  } u = {p};
  for (size_t i=0; i< sizeof u; i++) {
    printf(" %02X", u.uc[i]);
  }
  printf("\n");
}

int main(void) {
  int b = rand();
  int a = rand();
  printf("sizeof(int) = %zu\n", sizeof a);
  print_int_ptr("&a     =", &a);
  print_int_ptr("&a + 1 =", &a + 1);
  print_int_ptr("&b     =", &b);
  print_int_ptr("&b + 1 =", &b + 1);
  printf("&a + 1 == &b: %d\n", &a + 1 == &b);
  printf("&a == &b + 1: %d\n", &a == &b + 1);
  return a + b;
}

Output

sizeof(int) = 4
&a     = 0x28cc28 28 CC 28 00
&a + 1 = 0x28cc2c 2C CC 28 00  <-- same bit pattern
&b     = 0x28cc2c 2C CC 28 00  <-- same bit pattern
&b + 1 = 0x28cc30 30 CC 28 00
&a + 1 == &b: 1                <-- compare equal
&a == &b + 1: 0

Solution 4:

The authors of the Standard weren't trying to make it "language-lawyer-proof", and as a consequence, it is somewhat ambiguous. Such ambiguity will not generally be a problem when compiler writers make a bona fide effort to uphold the Principle of Least Astonishment, since there is a clear non-astonishing behavior, and any other behavior would have astonishing consequences. On the other hand, it does mean those compiler writers who are more interested in whether optimizations can be justified under any reading of the Standard than in whether they will be compatible with existing code can find interesting opportunities to justify incompatibility.

The Standard doesn't require that pointers' representations bear any relationship to the underlying physical architecture. It would be perfectly legitimate for a system to represent each pointer as a combination of a handle and an offset. A system which represented pointers in such fashion would be free to move the objects represented thereby around in physical storage as it saw fit. On such a system, the first byte of object #57 might follow immediately after the last byte of object #23 at one moment in time, but might be at some completely unrelated location at some other moment. I see nothing in the Standard that would prohibit such an implementation from reporting a "just past" pointer for object #23 as equal to a pointer to object #57 when the two objects happened to be adjacent, and as unequal when they happened not to be.

Further, under the as-if rule, an implementation that would be justified in moving objects around in such fashion and having a quirky equality operator, as a result, would be allowed to have a quirky equality operator whether or not it physically moved objects around in storage.

If, however, an implementation specifies how pointers are stored in RAM, and such definition would be inconsistent with the behavior described above, however, that would compel the implementation to implement the equality operator in a fashion consistent with that specification. Any compiler that wants to have a quirky equality operator must refrain from specifying a pointer-storage format that would be inconsistent with such behavior.

Further, the Standard would seem to imply that if code observes that if two pointers with defined values have identical representation, they must compare equal. Reading an object using a character type and then writing that same sequence of character-type values into another object should yield an object equivalent to the original; such equivalence is a fundamental feature of the language. If p is a pointer "just past" one object, and q is a pointer to another object, and their representations are copied to p2 and q2, respectively, then p1 must compare equal to p and q2 to q. If the decomposed character-type representations of p and q are equal, that would imply that q2 was written with the same sequence of character-type values as p1, which would, in turn, imply that all four pointers must be equal.

Consequently, while it would be allowable for a compiler to have quirky equality semantics for pointers which are never exposed to code that might observe their byte-level representation, such behavioral license would not extend to pointers which are thus exposed. If an implementation defines a directive or setting that invites compilers to have individual comparisons arbitrarily report equal or unequal when given pointers to the end of one object and the start of another whose placement would only be observable via such comparison, the implementation wouldn't have to worry about conformance in cases where pointer representations are observed. Otherwise, though, even in if there are cases where conforming implementations would be allowed to have quirky comparison semantics, that doesn't mean any quality implementations should do so unless invited unless a pointer just past the end of one object would naturally have a different representation from a pointer to the start of the next.