[gridengine users] Strange SGE PE issue (threaded PE with 999 slots but scheduler thinks the value is 0)

Reuti reuti at staff.uni-marburg.de
Thu Jun 11 20:17:08 UTC 2020


Hi,

Any consumables in place like memory or other resource requests? Any output of `qalter -w v …` or "-w p"?

-- Reuti


> Am 11.06.2020 um 20:32 schrieb Chris Dagdigian <dag at sonsorol.org>:
> 
> Hi folks,
> 
> Got a bewildering situation I've never seen before with simple SMP/threaded PE techniques
> 
> I made a brand new PE called threaded:
> 
> $ qconf -sp threaded
> pe_name            threaded
> slots              999
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    NONE
> stop_proc_args     NONE
> allocation_rule    $pe_slots
> control_slaves     FALSE
> job_is_first_task  TRUE
> urgency_slots      min
> accounting_summary FALSE
> qsort_args         NONE
> 
> 
> And I attached that to all.q on an IDLE grid and submitted a job with '-pe threaded 1' argument
> 
> However all "qstat -j" data is showing this scheduler decision line:
> 
> cannot run in PE "threaded" because it only offers 0 slots
> 
> 
> I'm sort of lost on how to debug this because I can't figure out how to probe where SGE is keeping track of PE specific slots.  With other stuff I can look at complex_values reported by execution hosts or I can use an "-F" argument to qstat to dump the live state and status of a requestable resource but I don't really have any debug or troubleshooting ideas for "how to figure out why SGE thinks there are 0 slots when the static PE on an idle cluster has. been set to contain 999 slots" 
> 
> Anyone seen something like this before?  I don't think I've ever seen this particular issue with an SGE parallel environment before ...
> 
> 
> Chris
> 
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users



More information about the users mailing list