[gridengine users] Barrier job
Jesse Becker
beckerjes at mail.nih.gov
Wed Jul 11 00:53:03 UTC 2012
On Tue, Jul 10, 2012 at 06:51:03PM -0400, David Erickson wrote:
>Hi all-
>Following up with a slightly different question from yesterday, I have
>another GE installation that has many hosts grabbing jobs from a
>single queue. These jobs are logically grouped together, although
>they are not submitted together (but actually when a job finishes from
>another GE cluster). I know when all jobs in a group have been
>submitted and I'd like to submit a job at the end that waits for all
>prior jobs to finish, then does some clean up work. I'm wondering
>what the best way to do this is, I saw the -hold_jid flag to qsub,
>which I suppose if I ensured all jobs in a logical group had the same
>name I could just supply that name there. Is this the best way to
>accomplish something like this?
Two things:
1) there's a pair of programs called barrier and barrierd that do this.
They are part of the 'clusterit' package.
http://clusterit.sourceforge.net/man/barrier.html
2) I think the holds can use wildcards on jobnames for holds. This,
this works:
$ qsub -N alpha1 wrapper.sh
$ qsub -N alpha2 wrapper.sh
$ qstat -u beckerjes
2503234 0.06498 alpha1 beckerjes r 07/10/2012 20:47:34 low.q at node1 1
2503235 0.01581 alpha2 beckerjes r 07/10/2012 20:47:28 low.q at node2 1
$ qsub -hold_jid alpha* wrapper.sh
2503234 0.06498 alpha1 beckerjes r 07/10/2012 20:47:34 low.q at node1 1
2503235 0.01581 alpha2 beckerjes r 07/10/2012 20:47:28 low.q at node2 1
2503236 0.00000 wrapper.sh beckerjes hqw 07/10/2012 20:47:59 1
$ qstat -j 2503236
<snip>
jid_predecessor_list (req): alpha*
jid_predecessor_list: 2503234,2503235
<snip>
scheduling info: job dropped because of job dependencies
--
Jesse Becker
NHGRI Linux support (Digicon Contractor)
More information about the users
mailing list