[gridengine users] Limit number of jobs by job name
dowobeha at gmail.com
Mon Feb 6 21:25:41 UTC 2012
I have a large number of jobs that I need to run. Each of these jobs
kicks off a number of child jobs. The child jobs do most of the actual
work - the parent jobs mostly sit and wait until the child jobs have
Ideally, I would like to kick off all of my parent jobs, and let them
spawn off all of their respective child jobs, and wait until
everything finishes. But there's a problem with this. If I kick off
all of the parent jobs, then the parent jobs take up lots of slots in
my grid, and it takes far longer than it should for the grid to work
through all of the child jobs, because the parent jobs are taking up
so many compute slots.
To solve this problem, it occurred to me that it would be nice if I
could specify (perhaps by job name) a maximum number of parent jobs
that can simultaneously be executing.
The way I'm currently working around this problem is the following. I
launch one or two parent jobs, then wait until they have spawned their
child jobs. At this point all of the slots in my grid have been
filled. I then launch the rest of my parent jobs, which don't run,
because no slots are available. I then use qmon to lower the priority
of my waiting parent jobs. This works OK, but later on I still
sometimes end up with too many parent jobs running simultaneously.
I've looked through the documentation to try to find a better
solution. The closest thing I've found is the -tc flag to qsub, which
allows me to limit the number of concurrent array jobs executing.
Unfortunately, the parent jobs are not themselves array jobs, and
while I suppose I could try to rewrite the parent launch scripts to
launch as an array job, this would be less than ideal.
I was wondering if anyone has any other ideas on how to specify that
no more than n instances of jobs with a specified name should be able
to run simultaneously. I'd be open to other mechanisms, too.
More information about the users