[gridengine users] exclusive use of nodes and qstat report "per queue"

Reuti reuti at staff.uni-marburg.de
Tue May 5 10:08:03 UTC 2015


> Am 04.05.2015 um 16:04 schrieb Stefano Bridi <stefano.bridi at gmail.com>:
> 
> Ok, today is not a busy day for that queue so I had to recreate the
> problem: by doing this I saw that while the queue is empty all works
> as expected (for the seconds between the submit and the start of the
> job it is displayed "qw" by the 'qstat -q E5m' as expected.
> The E5m queue is built with 5 nodes: n010[4-8]. At the moment only one
> is under real use so I need to submit 5 jobs to have one "qw".
> 
> $ qsub sleeper.sh
> Your job 876766 ("sleeper.sh") has been submitted
> $ qsub sleeper.sh
> Your job 876767 ("sleeper.sh") has been submitted
> $ qsub sleeper.sh
> Your job 876768 ("sleeper.sh") has been submitted
> $ qsub sleeper.sh
> Your job 876769 ("sleeper.sh") has been submitted
> $ qsub sleeper.sh
> Your job 876770 ("sleeper.sh") has been submitted
> $ qalter -w v 876770
> Job 876770 cannot run in queue "opteron" because it is not contained
> in its hard queue list (-q)
> Job 876770 cannot run in queue "x5355" because it is not contained in
> its hard queue list (-q)
> Job 876770 cannot run in queue "e5645" because it is not contained in
> its hard queue list (-q)
> Job 876770 cannot run in queue "x5560" because it is not contained in
> its hard queue list (-q)
> Job 876770 cannot run in queue "x5670" because it is not contained in
> its hard queue list (-q)
> Job 876770 cannot run in queue "E5" because it is not contained in its
> hard queue list (-q)

Although it's not the issue per se: did you create one queue for each type of machine? Maybe it's possible to combine them and request a STRING complex with the name of the architecture instead attached to queue instances by hostgroups or attached to the exechosts.

-- Reuti


> Job 876770 (-l exclusive=true) cannot run at host "n0104" because
> exclusive resource (exclusive) is already in use
> Job 876770 (-l exclusive=true) cannot run at host "n0105" because
> exclusive resource (exclusive) is already in use
> Job 876770 (-l exclusive=true) cannot run at host "n0106" because
> exclusive resource (exclusive) is already in use
> Job 876770 (-l exclusive=true) cannot run at host "n0107" because
> exclusive resource (exclusive) is already in use
> Job 876770 (-l exclusive=true) cannot run at host "n0108" because
> exclusive resource (exclusive) is already in use
> verification: no suitable queues
> $
> 
> Does this mean that the "exclusive" complex  requested via the "qsub
> -l excl=true" is evaluated on the node before the check on the hard
> queue list? If I am correct, is there another way to have both 'qstat
> -q' and exclusive use of nodes working?
> 
> thanks
> stefano
> 
> On Mon, May 4, 2015 at 1:45 PM, Reuti <reuti at staff.uni-marburg.de> wrote:
>> Hi,
>> 
>>> Am 04.05.2015 um 13:25 schrieb Stefano Bridi <stefano.bridi at gmail.com>:
>>> 
>>> Hi all,
>>> I need to give the possibility to the user to reserve one or more node
>>> for exclusive use for their runs.
>>> It is a mixed environment and If they don't reserve the node for
>>> exclusive use, the serial and low  number of core jobs will fragment
>>> the availability of cores across many nodes.
>>> The problem is that now the "exclusive" jobs are not listed anymore in
>>> the "per queue" qstat:
>>> 
>>> We solved the exclusive request  by setting up a new complex:
>>> 
>>> # qconf -sc excl
>>> #name               shortcut           type        relop   requestable
>>> consumable default  urgency
>>> #--------------------------------------------------------------------------------------------------
>>> exclusive           excl               BOOL        EXCL    YES
>>> YES        0        1000
>>> 
>>> and setting on every node usable in this way the relative complex (is
>>> there a way to set this system wide?):
>>> 
>>> #qconf -se n0108
>>> hostname              n0108
>>> load_scaling          NONE
>>> complex_values        exclusive=true
>>> load_values           arch=linux-x64,num_proc=20,....[snip]
>>> processors            20
>>> user_lists            NONE
>>> xuser_lists           NONE
>>> projects              NONE
>>> xprojects             NONE
>>> usage_scaling         NONE
>>> report_variables      NONE
>>> 
>>> now it I submit a job like:
>>> $ cat sleeper.sh
>>> #!/bin/bash
>>> 
>>> #
>>> #$ -cwd
>>> #$ -j y
>>> #$ -q E5m
>>> #$ -S /bin/bash
>>> #$ -l excl=true
>>> #
>>> date
>>> sleep 20
>>> date
>>> 
>>> $
>>> All works as expected except qstat:
>>> a generic 'qstat' report:
>>> job-ID  prior   name       user         state submit/start at
>>> queue                          slots ja-task-ID
>>> -----------------------------------------------------------------------------------------------------------------
>>> 876735 0.50601 sleeper.sh s.bridi      qw    05/04/2015 12:20:45
>>>                             1
>>> 
>>> and the 'qstat -j 876735' report:
>>> ==============================================================
>>> job_number:                 876735
>>> exec_file:                  job_scripts/876735
>>> submission_time:            Mon May  4 12:20:45 2015
>>> owner:                      s.bridi
>>> uid:                        65535
>>> group:                      domusers
>>> gid:                        15000
>>> sge_o_home:                 /home/s.bridi
>>> sge_o_log_name:             s.bridi
>>> sge_o_path:
>>> /sw/openmpi/142/bin:.:/ge/bin/linux-x64:/usr/lib64/qt-3.3/bin:/ge/bin/linux-x64:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/s.bridi/bin
>>> sge_o_shell:                /bin/bash
>>> sge_o_workdir:              /home/s.bridi/testexcl
>>> sge_o_host:                 login0
>>> account:                    sge
>>> cwd:                        /home/s.bridi/testexcl
>>> merge:                      y
>>> hard resource_list:         exclusive=true
>>> mail_list:                  s.bridi at login0
>>> notify:                     FALSE
>>> job_name:                   sleeper.sh
>>> jobshare:                   0
>>> hard_queue_list:            E5m
>>> shell_list:                 NONE:/bin/bash
>>> env_list:
>>> script_file:                sleeper.sh
>>> scheduling info:            [snip]
>>> 
>>> while the
>>> 'qstat -q E5m' don't list the job!
>> 
>> Usually this means that the job is not allowed to run in this queue.
>> 
>> What does:
>> 
>> $ qalter -w v 876735
>> 
>> ouput?
>> 
>> -- Reuti
>> 
>> 
>>> Thanks
>>> Stefano
>>> _______________________________________________
>>> users mailing list
>>> users at gridengine.org
>>> https://gridengine.org/mailman/listinfo/users
>> 




More information about the users mailing list