How do I know my GPU supports multiprocessing by default?
In that post you linked, in the UPDATE section of my answer, I indicated that the GPU scheduler has changed in Pascal and beyond (your Tesla P100 is a Pascal GPU).
MPS is supported on all current NVIDIA GPUs.
The results you got are expected (in the non-MPS case) because the GPU scheduler allows both kernels to run, in a time-sliced fashion. All currently supported CUDA GPUs support multiprocessing (in Default compute mode). However the older GPUs (e.g. Kepler) would run the kernel from one process, then the kernel from the other process. Pascal and newer GPUs will run the kernel from one process for a period of time, then the other process for a period of time, then the first process, etc in a round-robin time-sliced fashion.