Can CUDA Kernels Modify Host Memory?
Is there any way to get a kernel to modify an integer via passing a pointer to that integer to the kernel? It seems the pointer is pointing to an address in device memory, so the kernel does not affect the host.
Here's a simplified example with the behavior I've noticed.
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <iostream>
__global__
void change_cuda(int* c);
void change_var(int* c);
int main() {
using namespace std;
int c = 0;
int* ptc = &c;
change_var(ptc); // *ptc = 123
cout << c << endl;
cudaError_t errors;
cudaMallocManaged((void**)&ptc, sizeof(int));
change_cuda<<<1, 1>>> (ptc); // *ptc = 555
errors = cudaDeviceSynchronize();
cudaFree(ptc);
cout << cudaGetErrorString(errors) << endl;
cout << c << endl;
return 0;
}
__global__
void change_cuda(int* c) {
*c = 555;
}
void change_var(int* c) {
*c = 123;
}
Ideally, this would modify c
to be 555 at the end, but the output of this example is
123
no error
123
Clearly I am misunderstanding how this works. What is the correct way to get the behavior that I expect?
Yes, you have a misunderstanding. cudaMallocManaged
is an allocator like, for example, malloc
or new
. It returns a pointer that points to a new allocation, of the size requested.
It is not some method to allow your host stack based variable to be accessed from device code.
However, the allocated area pointed to by the pointer returned by cudaMallocManaged
can be accessed either from device code or host code. (It will not point to your c
variable.)
You can minimally fix your code by making the following changes. 1. comment out the call to cudaFree
. 2. print out the value of *ptc
rather than c
. Perhaps a more sensible change might be like this:
int main() {
using namespace std;
int* ptc;
cudaMallocManaged((void**)&ptc, sizeof(int));
change_var(ptc); // *ptc = 123
cout << *ptc << endl;
cudaError_t errors;
change_cuda<<<1, 1>>> (ptc); // *ptc = 555
cudaDeviceSynchronize();
errors = cudaGetLastError();
cout << cudaGetErrorString(errors) << endl;
cout << *ptc << endl;
return 0;
}