[gridengine users] A couple of questions...

Jesse Becker beckerjes at mail.nih.gov
Wed Jun 29 12:20:14 UTC 2011


On Wed, Jun 29, 2011 at 06:32:43AM -0400, Vic wrote:
>
>>> Can you provide more information about the application's startup
>>> process?
>>
>> Err - I'll ask my customer if I'm allowed to leak any info :-)
>
>Right - I can't go into too many specifics, but here's the gist of it:
>
>We're using an Altera tool called Quartus DSE. This generates a set of
>jobs with similar - but subtly different - parameters to try to find the
>best way to lay out an FPGA. The jobs are submitted in parallel to a grid.
>
>DSE uses LSF as its grid backend, but we've got a script (written locally)
>to convert the bsub calls into qsub calls. By the time we get to this part
>of the operation, we're well into the iterator, so we haven't got any info
>about the other jobs in the set :-(
>
>The job iterator in DSE is actually written in tcl, so I *could* patch
>that if absolutely necessary - but as these are production tools, I don't
>want to change anything unless there is no alternative, and if I do, it'll
>have to be a LSF-compatible patch (that we'd then submit to Altera), and a
>mod to the bsub->qsub wrapper script.
>
>Unless, of course, anyone else has already seen this problem and knows my
>analysis to be wrong (::crosses fingers and hopes someone does::)

Can you alter your bsub -> qsub translater include a '-P <project>'
string?  That way you can use an RQS to throttle the the number of jobs
running at a time.

You also mentioned that this was "well into the iterator".  If you use
qsub, you essetially dump all of the jobs into the queue, and let SGE
deal with them when it can.  You could also use qrsh or "qsub -now y".
These will both block until the job is actually complete.  This doesn't
sound like an ideal solution, but it may open some other options.


-- 
Jesse Becker
NHGRI Linux support (Digicon Contractor)



More information about the users mailing list