[gridengine users] define PE on special hosts

Reuti reuti at staff.uni-marburg.de
Sat Nov 19 17:54:32 UTC 2011


Am 19.11.2011 um 17:50 schrieb mahbube rustaee:

>>> number of slots per node is correct, but host list is not host list on that
>>> queue.
>> Which queue, the one it runs in or the one you wanted it to run in?
>> I defined a queue on a hostgroup (such @rack1) and attached to the queue a PE (such mpi2 with Allocatio_rule 2). I expected that any job launch to that queue use 2 slots of @rack1 hosts, but hosts that run jobs are not member of @rack1.
> jobs get 2 slots per host , but some hosts are not member of @rack1!   why?
> 
> I want jobs that launch to the mentioned queue use 2 slots of @rack1 hosts.

Did you setup the wildcard solution I outlined below?

In SGE you don't think in queues, but resource requests. If you request a PE which is attached to several queues, you will get what you asked for.

http://www.gridengine.info/2006/02/14/grouping-jobs-to-nodes-via-wildcard-pes/

Further more:

Is the output of `qstat -g t` not the expected one, or the node your MPI job tasks - possibly outisde of SGE's allocarion?

-- Reuti


> Thx so much
>>>> 
> 
> 
> On 11/19/11, Reuti <reuti at staff.uni-marburg.de> wrote:
>> Am 19.11.2011 um 13:00 schrieb William Hay:
>> 
>>> On 19 November 2011 09:58, mahbube rustaee <rustaee at gmail.com> wrote:
>>>> 
>>>> 
>>>> On Sat, Nov 19, 2011 at 12:05 PM, William Hay <w.hay at ucl.ac.uk> wrote:
>>>>> 
>>>>> On 19 November 2011 04:53, mahbube rustaee <rustaee at gmail.com> wrote:
>>>>>> Hi,
>>>>>> I defined a queue on @node-grp (a group of nodes).
>>>>>> I defined mpi2 parallel environment as:
>>>>>> start_proc_args    /opt/gridengine/mpi/startmpi.
>>>>>> sh $pe_hostfile
>>>>>> stop_proc_args     /opt/gridengine/mpi/stopmpi.sh
>>>>>> allocation_rule    2
>>>>>> control_slaves     FALSE
>>>>>> job_is_first_task  TRUE
>>>>>> urgency_slots      min
>>>>>> accounting_summary TRUE
>>>>>> 
>>>>>> and attach this PE to that queue. I expected that parallel programs get
>>>>>> slots from  @node-grp  on that queue and 2 slots on any host.
>>>>>> but that was not happen, slots are at another hosts that arenot member
>>>>>> of
>>>>>> @node-grp.
>>>>>> 
>>>>>> how can I define such parallel environment ?
>>>>> 1.Were the parallel programs submitted to the mpi2 PE or possibly to a
>>>>> different PE?
>>>> 
>>>> Yes,  the parallel programs submitted to the mpi2 PE.
>>>>> 
>>>>> 2.Is mpi2 defined in the pe_list of the queue in which the jobs actually
>>>>> do run?
>>>> 
>>>> yes,
>>> Well that would appear to be the problem.  If you have multiple queues
>>> with mpi2 defined then the job can run in any or all of them
>> 
>> Correct. To limit it to one hostgroup you will have to attach different PEs
>> to the hostgroups (this can still be done in one and the same queue) and
>> request a PE with a wildcard. E.g. if they should stay inside one rack or
>> so:
>> 
>> $ qconf -sq all.q
>> ...
>> pe_list NONE,[@rack1=MPI2a PVMa],[@rack2=MPI2b PVMb]
>> 
>> 
>> Then submitting:
>> 
>> $ qsub -pe "MPI2*" 4 job.sh
>> 
>> will give you machines only from one hostgroup. Of course MPI2a could also
>> be requested directly this way.
>> 
>> -- Reuti
>> 
>> 
>>>> number of slots per node is correct, but host list is not host list on
>>>> that
>>>> queue.
>>> Which queue, the one it runs in or the one you wanted it to run in?
>>> 
>>>>> 
>>>>> William
>>>>>> 
>>>>>> Thx
>>>> 
>>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users at gridengine.org
>>> https://gridengine.org/mailman/listinfo/users
>>> 
>> 
>> 




More information about the users mailing list