Howto set up SGE for CUDA devices?
Solution 1:
The strategy is actually fairly simple.
Using qconf -mc
you can create a complex resource called gpu
(or whatever you wish to name it). The resource definition should look something like:
#name shortcut type relop requestable consumable default urgency
#----------------------------------------------------------------------------------------------
gpu gpu INT <= YES YES 0 0
Then you should edit your exec host definitions with qconf -me
to set the number of GPUs on exec hosts that have them:
hostname node001
load_scaling NONE
complex_values gpu=2
user_lists NONE
xuser_lists NONE
projects NONE
xprojects NONE
usage_scaling NONE
report_variables NONE
Now that you've set up your exec hosts, you can request gpu resources when submitting jobs. eg: qsub -l gpu=1
and gridengine will keep track of how many GPUs are available.
If you have more than one job running per node that uses a GPU you may want to place your GPUs in to exclusive mode. You can do this with the nvidia-smi
utility.
Solution 2:
Open Grid Engine added GPU load sensor support in the 2011.11 release without the need for nvidia-smi. The output of the nvidia-smi application may (and does) change between driver releases, so the other approach is not recommended.
If you have the GE2011.11 source tree, look for: dist/gpu/gpu_sensor.c
To compile the load sensor (need to have the CUDA toolkit on the system):
% cc gpu_sensor.c -lnvidia-ml
And if you just want to see the status reported by the load sensor interactively, compile with:
-DSTANDALONE
To use the load sensor in a Grid Engine cluster, you will just need to follow the standard load sensor setup procedure:
http://gridscheduler.sourceforge.net/howto/loadsensor.html
Sources:
- http://marc.info/?l=npaci-rocks-discussion&m=132872224919575&w=2
Solution 3:
When you have multiple GPUs and you want your jobs to request a GPU but the Grid Engine scheduler should handle and select a free GPUs you can configure a RSMAP (resource map) complex (instead of a INT). This allows you to specify the amount as well as the names of the GPUs on a specific host in the host configuration. You can also set it up as a HOST consumable, so that independent of the slots your request, the amount of GPU devices requested with -l cuda=2 is for each host 2 (even if the parallel job got i.e. 8 slots on different hosts).
qconf -mc
#name shortcut type relop requestable consumable default urgency
#----------------------------------------------------------------------------------------------
gpu gpu RSMAP <= YES HOST 0 0
In the execution host configuration you can initialize your resources with ids/names (here simply GPU1 and GPU2).
qconf -me yourhost
hostname yourhost
load_scaling NONE
complex_values gpu=2(GPU1 GPU2)
Then when requesting -l gpu=1 the Univa Grid Engine scheduler will select GPU2 if GPU1 is already used by a different job. You can see the actual selection in the qstat -j output. The job gets the selected GPU by reading out the $SGE_HGR_gpu environment variable, which contains in this case the chose id/name "GPU2". This can be used for accessing the right GPU without having collisions.
If you have a multi-socket host you can even attach a GPU directly to some CPU cores near the GPU (near the PCIe bus) in order to speed up communication between GPU and CPUs. This is possible by attaching a topology mask in the execution host configuration.
qconf -me yourhost
hostname yourhost
load_scaling NONE
complex_values gpu=2(GPU1:SCCCCScccc GPU2:SccccSCCCC)
Now when the UGE scheduler selects GPU2 it automatically binds the job to all 4 cores (C) of the second socket (S) so that the job is not allowed to run on the first socket. This does not even require the -binding qsub param.
More configuration examples you can find on www.gridengine.eu.
Note, that all these features are only available in Univa Grid Engine (8.1.0/8.1.3 and higher), and not in SGE 6.2u5 and other Grid Engine version (like OGE, Sun of Grid Engine etc.). You can try it out by downloading the 48-core limited free version from univa.com.