[gridengine users] Issue with hostname specification and parallel environment - jobs do not start

Reuti reuti at staff.uni-marburg.de
Thu Jan 5 08:51:41 UTC 2017


Hi,

Am 05.01.2017 um 07:54 schrieb Manfred Selz:

> Hi,
>  
> in my SGE 6.2u5 environment, I am seeing a strange issue when submitting jobs to a parallel environment while also providing a hard hostname resource requirement.
> This is not a standard situation, but sometimes certain benchmarks need to be run on one specific host only.
>  
> When submitting a jobs either with a parallel environment or with a hard hostname resource specification, the jobs starts without delay.
> However, the combination of both sometimes keeps jobs waiting for an extended period of time, and I have not been able to get a clear messages from the “qstat -j  <jobID>” report.
>  
> Parallel environment settings is:
> $  qconf -sp local
> pe_name            local
> slots              1000
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    /bin/true
> stop_proc_args     /bin/true
> allocation_rule    $pe_slots
> control_slaves     FALSE
> job_is_first_task  TRUE
> urgency_slots      min
> accounting_summary TRUE
>  
> The specific host being targeted has 32 slots configured for the queue being used, and all of them are unused at this time.
> Is anybody aware of specific issues with the combination of parallel environments and a hard hostname resource request?
>  
> I have already tested this:
> ·         Removed the parallel environment request - works
> ·         Removed the hostname request - works
> ·         Removed all resource limits (“qconf -mrqs”) - no change
> ·         Increased the “slots” limit in the PE setting - no change
> ·         Changed the PE allocation_rule to “round_robin” - no change 

I only saw problems when requesting a queue and a host at the same time, i.e. "-q" & "-l h=" at the same time. The solution may work also in your case: request the host by a queue request:

-q "*@node123"


> After all, the final message in the “qstat -j <jobID>” report is always:
> cannot run in PE "local" because it only offers 0 slots

I assume the node is free and you have no backfilling issue where slots are reserved.

-- Reuti


> 
> I have seen many older reports for the “only offers 0 slots” message on older pages, but none specifically for the combination with a hostname spec. (only).
>  
> Regards,
> Manfred
>  
>  
> 
> 
> 
> Dialog Semiconductor GmbH
> Neue Str. 95
> D-73230 Kirchheim
> Managing Directors: Dr. Jalal Bagherli, Carsten Dahl
> Chairman of the Supervisory Board: Rich Beyer
> Commercial register: Amtsgericht Stuttgart: HRB 231181
> UST-ID-Nr. DE 811121668
> 
> Legal Disclaimer: This e-mail communication (and any attachment/s) is confidential and contains proprietary information, some or all of which may be legally privileged. It is intended solely for the use of the individual or entity to which it is addressed. Access to this email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.
> 
> 
> Please consider the environment before printing this e-mail
>  
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users





More information about the users mailing list