[gridengine users] Barrier job

Jesse Becker beckerjes at mail.nih.gov
Wed Jul 11 00:53:03 UTC 2012


On Tue, Jul 10, 2012 at 06:51:03PM -0400, David Erickson wrote:
>Hi all-
>Following up with a slightly different question from yesterday, I have
>another GE installation that has many hosts grabbing jobs from a
>single queue.  These jobs are logically grouped together, although
>they are not submitted together (but actually when a job finishes from
>another GE cluster).  I know when all jobs in a group have been
>submitted and I'd like to submit a job at the end that waits for all
>prior jobs to finish, then does some clean up work.  I'm wondering
>what the best way to do this is, I saw the -hold_jid flag to qsub,
>which I suppose if I ensured all jobs in a logical group had the same
>name I could just supply that name there.  Is this the best way to
>accomplish something like this?

Two things:

1) there's a pair of programs called  barrier and barrierd that do this.
They are part of the 'clusterit' package.
	http://clusterit.sourceforge.net/man/barrier.html

2) I think the holds can use wildcards on jobnames for holds.  This,
this works:

	$ qsub -N alpha1 wrapper.sh 
	$ qsub -N alpha2 wrapper.sh
	$ qstat -u beckerjes
	2503234 0.06498 alpha1     beckerjes r   07/10/2012 20:47:34 low.q at node1 1
	2503235 0.01581 alpha2     beckerjes r   07/10/2012 20:47:28 low.q at node2 1 

	$ qsub -hold_jid alpha* wrapper.sh
	2503234 0.06498 alpha1     beckerjes r   07/10/2012 20:47:34 low.q at node1 1
	2503235 0.01581 alpha2     beckerjes r   07/10/2012 20:47:28 low.q at node2 1 
	2503236 0.00000 wrapper.sh beckerjes hqw 07/10/2012 20:47:59 1

	$ qstat -j 2503236
	<snip>
	jid_predecessor_list (req):  alpha*
	jid_predecessor_list:       2503234,2503235
	<snip>
	scheduling info:            job dropped because of job dependencies



-- 
Jesse Becker
NHGRI Linux support (Digicon Contractor)


More information about the users mailing list