Sun Grid Engine: Automatically Terminating Idle Interactive Jobs

We're considering using Sun Grid Engine on a small compute cluster. Right now, the current set up is pretty crude and just involves having people ssh to an open machine to run their jobs.

We'd like to allow interactive jobs, since that should ease the transition from manually starting jobs to starting them using qsub. But, there is some concern that, if we do, people might accidentally leave their interactive sessions idle and block other jobs from being run on the machines. The issue isn't just theoretical, since we previously tried using OpenPBS and there was a problem with people opening up an interactive job in a screen session and essentially camping on a machine.

Is there anyway to configure SGE to automatically kill idle interactive jobs? It looks like this was requested as an enhancement (Issue #:2447) way back in 2007. But, it doesn't seem like the request ever got implemented.


Solution 1:

You could set SGE to have a reasonable default walltimes (h_rt) to terminate sessions after a predefined limit.

Have you tried getting the user's shell to timeout on idle? More information and examples at http://www.cyberciti.biz/faq/linux-unix-login-bash-shell-force-time-outs/.

Solution 2:

nayrmil has some good suggestions. Another option would be to limit which machines can run interactive jobs. We basically designate some nodes as "interactive" and put a queue on them that oversubscribes the node resources, so many users can log in at once. The users can basically camp there as long as they want, but if they want to get access to some real resources they need to submit a proper job.