How to determine number of GPU cores being utilized for a process?
On GPUs, 100% utilization means that all cores are basically executing instructions. It is running at peak performance when all of those instructions are fused-multiply-add operations: a = a + b * c
, which most current GPUs can do using only a single instruction.
When you are writing a program that performs computation on the GPU (using CUDA or OpenCL), you distribute the work in a so-called grids of blocks of threads (CUDA terminology). It is up to the GPU to schedule all these threads (in 'warps' of 32 threads) to keep all GPU cores busy. I don't know how familiar you are with this subject, but this introduction might be an interesting read.
An example explains how this relates to utilization. Let's say they GPU is idle and then you start an application that launches a kernel only for a single thread block, but with sufficient computations within this block. The GPU will schedule the thread block onto on of the streaming multiprocessors (SMs = group of 128 cores) on the GPU. In case of for example the Nvidia GTX 1080, which has 20 SMs, that would result in an utilization of only (1/20 * 100%) = 5%.
Given this basic knowledge of GPU-computing, you can run it through the Nvidia Visual Profiler (for CUDA applications) or through CodeXL for (OpenCL applications) to see those thread configurations for any kernel that the application uses to reason about utilization of the GPU. But that's not all, this tool is invaluable to see exactly what kind of operations (and how efficient) the GPU is executing.