Can CUDA Kernels Modify Host Memory?

Is there any way to get a kernel to modify an integer via passing a pointer to that integer to the kernel? It seems the pointer is pointing to an address in device memory, so the kernel does not affect the host.

Here's a simplified example with the behavior I've noticed.

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <iostream>

__global__
void change_cuda(int* c);

void change_var(int* c);

int main() {
    using namespace std; 

    int c = 0;
    int* ptc = &c;

    change_var(ptc); // *ptc = 123

    cout << c << endl;

    cudaError_t errors;

    cudaMallocManaged((void**)&ptc, sizeof(int));

    change_cuda<<<1, 1>>> (ptc); // *ptc = 555

    errors = cudaDeviceSynchronize();

    cudaFree(ptc);

    cout << cudaGetErrorString(errors) << endl;
    cout << c << endl;

    return 0;
}

__global__
void change_cuda(int* c) {
    *c = 555;
}

void change_var(int* c) {
    *c = 123;
}

Ideally, this would modify c to be 555 at the end, but the output of this example is

123
no error
123

Clearly I am misunderstanding how this works. What is the correct way to get the behavior that I expect?


Yes, you have a misunderstanding. cudaMallocManaged is an allocator like, for example, malloc or new. It returns a pointer that points to a new allocation, of the size requested.

It is not some method to allow your host stack based variable to be accessed from device code.

However, the allocated area pointed to by the pointer returned by cudaMallocManaged can be accessed either from device code or host code. (It will not point to your c variable.)

You can minimally fix your code by making the following changes. 1. comment out the call to cudaFree. 2. print out the value of *ptc rather than c. Perhaps a more sensible change might be like this:

int main() {
    using namespace std; 

    int* ptc;

    cudaMallocManaged((void**)&ptc, sizeof(int));

    change_var(ptc); // *ptc = 123

    cout << *ptc << endl;

    cudaError_t errors;

    change_cuda<<<1, 1>>> (ptc); // *ptc = 555

    cudaDeviceSynchronize();

    errors = cudaGetLastError();

    cout << cudaGetErrorString(errors) << endl;
    cout << *ptc << endl;

    return 0;
}