allocating shared memory

CUDA supports dynamic shared memory allocation. If you define the kernel like this:

__global__ void Kernel(const int count)
{
    extern __shared__ int a[];
}

and then pass the number of bytes required as the the third argument of the kernel launch

Kernel<<< gridDim, blockDim, a_size >>>(count)

then it can be sized at run time. Be aware that the runtime only supports a single dynamically declared allocation per block. If you need more, you will need to use pointers to offsets within that single allocation. Also be aware when using pointers that shared memory uses 32 bit words, and all allocations must be 32 bit word aligned, irrespective of the type of the shared memory allocation.


const doesn't mean "constant", it means "read-only".

A constant expression is something whose value is known to the compiler at compile-time.


option one: declare shared memory with constant value (not the same as const)

__global__ void Kernel(int count_a, int count_b)
{
    __shared__ int a[100];
    __shared__ int b[4];
}

option two: declare shared memory dynamically in the kernel launch configuration:

__global__ void Kernel(int count_a, int count_b)
{
    extern __shared__ int *shared;
    int *a = &shared[0]; //a is manually set at the beginning of shared
    int *b = &shared[count_a]; //b is manually set at the end of a
}

sharedMemory = count_a*size(int) + size_b*size(int);
Kernel <<<numBlocks, threadsPerBlock, sharedMemory>>> (count_a, count_b);

note: Pointers to dynamically shared memory are all given the same address. I use two shared memory arrays to illustrate how to manually set up two arrays in shared memory.