C++ dangling reference strange behaviour

int*& f(int*& x, int* y){
 int** z = &y;
 *z = x;
 return *z;
}

Hello everyone, I've been given this code on an exam and I had some problems with it.

My understanding is that given a reference to a pointer (x) and a pointer copy constructed (y) in the body of the function a local double pointer (z) is beeing created and initialized with y l-value, then dereferenced 1 time, so y is beeing accessed and the address contained in y noe becomes the address contained in x. Afterwards *z is returned as a pointer reference I assume of y.

If my previous part is correct I'm not explaining why returning y who gets deallocated on the exit of the function (since a temporary parameter) does not create any problems in the program, indeed the exam answer was that the function was returning a dangling reference and I agree with that, but, copypasting the code, and "playing" with the returned variable, even doing random stuff after the parameter is returned in order to "edit the stack" where the deallocated y still is present and "ready to be overwritten"(if I'm still right) does not present any undefined behaviour of the program.

My only explaination is that the return only copies the r-value contained in y or maybe it returns l-value and r-value (since returns a reference to a pointer) but when associated to an external pointer "calling" the function y doesn't get properly deallocated or in some way the pointer that gets the value takes the place of the y pointer that gets deallocated.

On the bottom you can find the code used to test the function int*& f(int*&, int*).

My question is: Is this a proper dangling reference or is a borderline case where such a thing could be used in a program?

#include <iostream>
using namespace std;
int a = 65;

int*& f(int*& x, int* y)
{
    cout<<"indirizzo di y: "<<&y<<endl;
    cout<<"indirizzo di x: "<<&x<<endl;

    int** z = &y;
    cout<<"indirizzo di *z prima: "<<*z<<endl;
    *z=x;
    cout<<"indirizzo di *z dopo: "<<*z<<endl;
    cout<<"y punta a: "<<y<<endl;
     cout<<"z dopo: "<<z<<endl;
    return *z;
}

int*& crashaFisso() //function that crashes every time with a "proper" dangling reference
{
    int a=10;
    int* x =&a;
    return x;
}

int main()
{
    system("CLS");
  
    int b = 20;
    int codicerandom=0;
    
    int* i = &a;
    cout<<"indirizzo di i: "<<i<<endl;
    int* u = &b;
    cout<<"indirizzo funzione: "<<&f(i,u)<<endl;
    int* aux = f(i,u);

    int* crash = crashaFisso();

    cout<<"crash: "<<*crash<<endl;
    cout<<"aux: "<<*aux<<endl;

    for(int i=0;i<100;i++)
        codicerandom +=i;
    
    for(int i=0;i<100;i++)
        codicerandom +=i;
    for(int ji=100;ji>0;ji--)
    {
        codicerandom +=ji;
        for(int x=100;x>0;x--)
        {
            codicerandom -= x*2;
        }
    }
    
    cout<<codicerandom<<endl;


    cout<<"crash: "<<*crash<<endl;
    cout<<"aux: "<<*aux<<endl;

    cout<<"crash: "<<*crash<<endl;
    cout<<"aux: "<<*aux<<endl;

    a=32;

    cout<<"aux: "<<*aux<<endl;
    return 0;
}

the exam answer was that the function was returning a dangling reference

Correct.

but (...) does not present any undefined behaviour of the program.

What makes you think so? Undefined behaviour doesn't mean "program doesn't work correctly" or "program crashes". Undefined behaviour means exactly what it says: the behaviour is not defined by the standard. In fact it may work "correctly" (whatever that means), the standard doesn't prohibit it. That's why it is so dangerous. Because maybe in your test it works correctly, because of the hardware, OS, specific compiler, some other assumptions that take place. But the problem is that it is not guaranteed to work correctly. If you change machine, OS, a compiler (even switch optimization settings), a code slightly or even compile it two days later it may behave weirdly, in an (ekhm) undefined way.

In general there is no way to know whether a program behaves correctly or not, if UB is present. You are trying to analyze the situation by thinking about l-values, r-values, allocations, etc. while the reality is that when UB is present the entire program is meaningless. You just waste time.

Do not write UB code. Regardless of whether it seems that it works or not.


does not present any undefined behaviour of the program.

Undefined behavior means that there is NO guarantee on the program behavior. The program always seemingly working is allowed under this condition. But it is also allowed that the next time you compile the program with a new compiler version (or even just run the same program again), it may suddenly not work anymore.


In your program both f and crashaFisso return a dangling reference to type int*. That by itself is not undefined behavior. You are allowed to return dangling references and pointers. However, such return values are useless, because they can not be used in any practical way.

In your code

&f(i,u)

is the first problem. You are taking here the address of the dangling reference. This most likely already undefined behavior by itself, I am not completely sure at the moment. If it is not, passing the resulting invalid pointer value to a function outputting the value has at best implementation-defined behavior, which may be ok.

The line

 int* aux = f(i,u);

is definitively undefined behavior. You are trying to take the value from the object that the dangling reference referenced before it was destroyed to initialize the new int* pointer. That is absolutely undefined behavior.


Your test is not as sharp as it could be: In order to show that the reference is dangling you should actually store the reference and not a copy of the value of the deceased object it refers to.

To understand why that would be more interesting let's dissect the function for a sec.

  • int** z = &y; makes z point to y; *z is now an alias for y.
  • *z=x; makes a copy of the address value the pointer referenced by x contains and assigns it to the entity known as y or *z. That address is entirely valid (f() is called with the address of main's a).
  • return *z; returns an lvalue reference (that is, a reference you could syntactically assign to) to the lvalue *z aka y. That lvalue is of type pointer to int and contains the valid address of main's a. The issue with the code is that what is referred to, namely y, is destroyed as soon as the function has returned so that reading the value through it in cout<<"indirizzo funzione: "<<&f(i,u) is undefined behavior, and the compiler warns about it.

The reason that the program doesn't crash is that immediately after f returns, the memory of its former local variables is still intact. Of course it's illegal to access it, but if you look at the memory it's all there. Consequently, int* aux = f(i,u); simply reads the (valid) address stored in the recently deceased y and stores it as a copy in aux. You can now write on the stack as much as you like: aux will contain a valid value.

That's why you were not successful in your attempts to write on the stack in order to overwrite it.

If instead you store the returned reference to *z aka y you'll refer to the deceased object itself which inevitably will be overwritten by future stack operations or used in other ways by the compiler.

Here is an anglicized, minimal example using a reference instead of a copy (note the definition of the variable dangling_ref). I compile and run it it twice, with standard optimization and with maximum optimization. Simply changing the compiler options changes the output (and, what I'd assume is a bug, determines whether the warning is output!). Here is a sample session on msys2.

$ cat dangling-ref.cpp
#include <iostream>
using namespace std;

int*& dangling_ref_ret(int*& x, int* y)
{
    int** z = &y;
    *z = x;
    cout << "ret addr " << *z << " (should be == " << y << ")" << ", val = " << **z << endl;
    return *z;
}

int main()
{
    int b = 1;
    int* pb = &b;
    int c = 2;
    int*& dangling_ref = dangling_ref_ret(pb, &c);
    cout << "val of dangling_ref " << dangling_ref << " is " << *dangling_ref << endl;
}

$ gcc --version
gcc (GCC) 10.2.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ g++ -Wall -o dangling-ref dangling-ref.cpp && ./dangling-ref.exe
ret addr 0xffffcc34 (should be == 0xffffcc34), val = 1
val of dangling_ref 0xffffcc34 is 1
$ g++ -Wall -O3 -o dangling-ref dangling-ref.cpp && ./dangling-ref.exe
dangling-ref.cpp: In function ‘int*& dangling_ref_ret(int*&, int*)’:
dangling-ref.cpp:9:13: warning: function may return address of local variable [-Wreturn-local-addr]
    9 |     return *z;
      |             ^
dangling-ref.cpp:4:38: note: declared here
    4 | int*& dangling_ref_ret(int*& x, int* y)
      |                                 ~~~~~^
ret addr 0xffffcc10 (should be == 0xffffcc10), val = 1
val of dangling_ref 0xffffcc14 is 2

Visual Studio also behaves differently between Debug and Release mode.

You can try different compilers and options on godbolt.


Undefined behavior means anything1 can happen including but not limited to the program giving your expected output. But never rely(or make conclusions based) on the output of a program that has undefined behavior.

So the output that you're seeing is a result of undefined behavior. And as i said don't rely on the output of a program that has UB.

So the first step to make the program correct would be to remove UB. Then and only then you can start reasoning about the output of the program.


1For a more technically accurate definition of undefined behavior see this where it is mentioned that: there are no restrictions on the behavior of the program.