[gridengine users] Strange SGE PE issue (threaded PE with 999 slots but scheduler thinks the value is 0)

Chris Dagdigian dag at sonsorol.org
Thu Jun 11 18:32:18 UTC 2020


Hi folks,

Got a bewildering situation I've never seen before with simple 
SMP/threaded PE techniques

I made a brand new PE called threaded:

$ qconf -sp threaded
pe_name            threaded
slots              999
user_lists         NONE
xuser_lists        NONE
start_proc_args    NONE
stop_proc_args     NONE
allocation_rule    $pe_slots
control_slaves     FALSE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary FALSE
qsort_args         NONE


And I attached that to all.q on an IDLE grid and submitted a job with 
'-pe threaded 1' argument

However all "qstat -j" data is showing this scheduler decision line:

cannot run in PE "threaded" because it only offers 0 slots


I'm sort of lost on how to debug this because I can't figure out how to 
probe where SGE is keeping track of PE specific slots.  With other stuff 
I can look at complex_values reported by execution hosts or I can use an 
"-F" argument to qstat to dump the live state and status of a 
requestable resource but I don't really have any debug or 
troubleshooting ideas for "how to figure out why SGE thinks there are 0 
slots when the static PE on an idle cluster has. been set to contain 999 
slots"

Anyone seen something like this before?  I don't think I've ever seen 
this particular issue with an SGE parallel environment before ...


Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20200611/c9acaaa3/attachment.html>


More information about the users mailing list