[gridengine users] A couple of questions...

Maes, Richard rmaes at ciena.com
Wed Jun 29 15:51:35 UTC 2011


We hacked up the quartus DSE script to use SGE just like you did.  Your
requirements are significant and we have run into the same issue here.
Users always want their jobs to start immediately and if you hold some
reserve horse power to give an appearance of response, they want to know
why all the nodes are not running their jobs.

 

Your requirements seem to be pretty tight.  I'm not sure I caught all of
them.  So forgive me if I am suggesting something that is a non-starter.

 

Just to clarify, if your cluster is full, (a slots spoken fore) you are
not expecting that some jobs get resubmitted correct in lieu of letting
new jobs into the run state?

 

If I understand correctly, then you are planning on only allowing a
maximum number of slots per invocation.  

 

The reason I ask, is that we have wrappered DSE with yet another script
that does a couple things for us, including running simulation
regressions.

We can flip switches to make it a DSE only (with multiple levels of
DSE), or Sim Only, or some combination.  In that script we do things
like select the queue we are going to submit to.

 

Originally, when I didn't know better, I did the following... which
seemed to work.  It's just hard to manage, and there is the possibility
of a race state which can cause some screw ups.  I never saw the race
state happen in the production environment, but I was able to induce it
if I tried hard enough.

 

Step 1. Create yourself a bunch of dse queues.  Call them something
easily scriptable like dse01, dse02, ... dse15

Step 2. You have to limit the total number of DSE  jobs in play in some
way, which could be a down and dirty as the number of slots available on
the machines, or you could create a 'users * queues  dse to slots = XX

 

Here is thing I still don't know, because I went away from this, but I
always wondered if I created a main dse queue and applied the limit rule
to it, and made all the other queues subordinate, would that in fact
limit the total number of jobs from all DSE sources?  In reality, I
think I just wrote it as 'users * queues {dse00, dse01, dse02 ....
dse15} to slots = 45'

 

Step 3 .So now, you have your global DSE limit, now you need to create a
"invocation limit".  Do so by making a rule for each DSE queue, and
limit to 10 let say.

 

Now, in the wrapper script, I should have used a locking file for the
next step.  If you are going to try this, look into it.

In a globally available location create a DSE queue file that has a
number in it 0 - 15.  Tells you the next available to DSE queue to use.
Your new quartus DSE wrapper script should make a locking file, open
that queue file, increment the number, or roll it back to 0 if it is on
15 already, and then launch the DSE run using your subordinate queue.  

 

It's a bit Rube Goldberg, but might achieve what you are looking for.  

 

Course if you are going to go through all that trouble, then why not do
ticketing from a wrapper script

 

 




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20110629/0a07d798/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 3410 bytes
Desc: not available
URL: <http://gridengine.org/pipermail/users/attachments/20110629/0a07d798/attachment.gif>


More information about the users mailing list