[gridengine users] Strange SGE PE issue (threaded PE with 999 slots but scheduler thinks the value is 0)

Feng Zhang prod.feng at gmail.com
Thu Jun 11 20:43:57 UTC 2020


Is "threads" added into all.q?

Also can check "qconf -srqs" is there's any limit

On Thu, Jun 11, 2020 at 2:33 PM Chris Dagdigian <dag at sonsorol.org> wrote:
>
> Hi folks,
>
> Got a bewildering situation I've never seen before with simple SMP/threaded PE techniques
>
> I made a brand new PE called threaded:
>
> $ qconf -sp threaded
> pe_name            threaded
> slots              999
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    NONE
> stop_proc_args     NONE
> allocation_rule    $pe_slots
> control_slaves     FALSE
> job_is_first_task  TRUE
> urgency_slots      min
> accounting_summary FALSE
> qsort_args         NONE
>
>
> And I attached that to all.q on an IDLE grid and submitted a job with '-pe threaded 1' argument
>
> However all "qstat -j" data is showing this scheduler decision line:
>
> cannot run in PE "threaded" because it only offers 0 slots
>
>
> I'm sort of lost on how to debug this because I can't figure out how to probe where SGE is keeping track of PE specific slots.  With other stuff I can look at complex_values reported by execution hosts or I can use an "-F" argument to qstat to dump the live state and status of a requestable resource but I don't really have any debug or troubleshooting ideas for "how to figure out why SGE thinks there are 0 slots when the static PE on an idle cluster has. been set to contain 999 slots"
>
> Anyone seen something like this before?  I don't think I've ever seen this particular issue with an SGE parallel environment before ...
>
>
> Chris
>
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users



More information about the users mailing list