New to CUDA, Question About Memory Allocation [duplicate]
The rules for kernel arguments are a logical consequence of C++ parameter passing rules and the fact that device and host memory are physically separate.
CUDA does not allow passing arguments by reference and you must be careful with pointers.
Specifically, you must pass parameters by value. Passing user-defined types requires that the default copy-constructor or your own copy-constructor (if present) does not contain any memory allocations (heap allocations with "new" or "malloc").
In summary pass-by-value works well for integral, floating point or other primitive types, and simple flat user-defined structs or class objects.
You only need to use cudaMalloc()
and cudaMemcpy()
for blocks of data. Not single int
s and the like. You also can pass struct
s as parameters, as long as they have no members pointing to a block of data in host memory.
So as a rule of thumb: if you are passing a pointer to a kernel, make sure it points into device memory.