Slurm jobs are pending, but resources are available
SLURM by default does not allow resources sharing, so when a job runs in 1 node the rest of the jobs wait for it to complete before executing any further jobs on the same node.
SLURM needs to be configured for resources sharing, this should be fairly simple and well documented.
An example of what to add to your slurm.conf file (normally located under /etc/slurm) would be:
SelectType=select/cons_res
SelectTypeParameters=
DefMemPerCPU=
This would allow sharing of the resources of a node using the con_res
plugin.
The select/con_res
plugin allows a wide variety of Parameters (SelectTypeParameters). The most prominent are listed below (for a full list of parameters please refer to the manual page of slurm.conf):
CR_CPU: CPUs are the consumable resource.
CR_CPU_Memory: adds memory as consumable to CR_CPU.
CR_Core Cores: Cores are the consumable resource.
CR_Core_Memory: adds memory as consumable to CR_CPU_Memory.
After that is done and you have selected the type of resource you want to use as consumable in SLURM all you need to do is add the option shared=yes
to your default queue and issue the command scontrol reconfigure
in the node that is being used as controller.