[gridengine users] exclusive use of nodes and qstat report "per queue"

Stefano Bridi stefano.bridi at gmail.com
Tue May 5 09:52:12 UTC 2015


Ok, sorry, yesterday I miss to reply to the list.
Today is not a busy day for that queue so I had to recreate the
problem: by doing this I saw that while the queue is empty all works
as expected (for the seconds between the submit and the start of the
job it is displayed "qw" by the 'qstat -q E5m' as expected.
The E5m queue is built with 5 nodes: n010[4-8]. At the moment only one
is under real use so I need to submit 5 jobs to have one "qw".

$ qsub sleeper.sh
Your job 876766 ("sleeper.sh") has been submitted
$ qsub sleeper.sh
Your job 876767 ("sleeper.sh") has been submitted
$ qsub sleeper.sh
Your job 876768 ("sleeper.sh") has been submitted
$ qsub sleeper.sh
Your job 876769 ("sleeper.sh") has been submitted
$ qsub sleeper.sh
Your job 876770 ("sleeper.sh") has been submitted
$ qalter -w v 876770
Job 876770 cannot run in queue "opteron" because it is not contained
in its hard queue list (-q)
Job 876770 cannot run in queue "x5355" because it is not contained in
its hard queue list (-q)
Job 876770 cannot run in queue "e5645" because it is not contained in
its hard queue list (-q)
Job 876770 cannot run in queue "x5560" because it is not contained in
its hard queue list (-q)
Job 876770 cannot run in queue "x5670" because it is not contained in
its hard queue list (-q)
Job 876770 cannot run in queue "E5" because it is not contained in its
hard queue list (-q)
Job 876770 (-l exclusive=true) cannot run at host "n0104" because
exclusive resource (exclusive) is already in use
Job 876770 (-l exclusive=true) cannot run at host "n0105" because
exclusive resource (exclusive) is already in use
Job 876770 (-l exclusive=true) cannot run at host "n0106" because
exclusive resource (exclusive) is already in use
Job 876770 (-l exclusive=true) cannot run at host "n0107" because
exclusive resource (exclusive) is already in use
Job 876770 (-l exclusive=true) cannot run at host "n0108" because
exclusive resource (exclusive) is already in use
verification: no suitable queues
$

Does this mean that the "exclusive" complex  requested via the "qsub
-l excl=true" is evaluated on the node before the check on the hard
queue list? If I am correct, is there another way to have both 'qstat
-q' and exclusive use of nodes working?

thanks
stefano
Il 04/mag/2015 13:46, "Reuti" <reuti at staff.uni-marburg.de> ha scritto:

> Hi,
>
> > Am 04.05.2015 um 13:25 schrieb Stefano Bridi <stefano.bridi at gmail.com>:
> >
> > Hi all,
> > I need to give the possibility to the user to reserve one or more node
> > for exclusive use for their runs.
> > It is a mixed environment and If they don't reserve the node for
> > exclusive use, the serial and low  number of core jobs will fragment
> > the availability of cores across many nodes.
> > The problem is that now the "exclusive" jobs are not listed anymore in
> > the "per queue" qstat:
> >
> > We solved the exclusive request  by setting up a new complex:
> >
> > # qconf -sc excl
> > #name               shortcut           type        relop   requestable
> > consumable default  urgency
> >
> #--------------------------------------------------------------------------------------------------
> > exclusive           excl               BOOL        EXCL    YES
> > YES        0        1000
> >
> > and setting on every node usable in this way the relative complex (is
> > there a way to set this system wide?):
> >
> > #qconf -se n0108
> > hostname              n0108
> > load_scaling          NONE
> > complex_values        exclusive=true
> > load_values           arch=linux-x64,num_proc=20,....[snip]
> > processors            20
> > user_lists            NONE
> > xuser_lists           NONE
> > projects              NONE
> > xprojects             NONE
> > usage_scaling         NONE
> > report_variables      NONE
> >
> > now it I submit a job like:
> > $ cat sleeper.sh
> > #!/bin/bash
> >
> > #
> > #$ -cwd
> > #$ -j y
> > #$ -q E5m
> > #$ -S /bin/bash
> > #$ -l excl=true
> > #
> > date
> > sleep 20
> > date
> >
> > $
> > All works as expected except qstat:
> > a generic 'qstat' report:
> > job-ID  prior   name       user         state submit/start at
> > queue                          slots ja-task-ID
> >
> -----------------------------------------------------------------------------------------------------------------
> > 876735 0.50601 sleeper.sh s.bridi      qw    05/04/2015 12:20:45
> >                              1
> >
> > and the 'qstat -j 876735' report:
> > ==============================================================
> > job_number:                 876735
> > exec_file:                  job_scripts/876735
> > submission_time:            Mon May  4 12:20:45 2015
> > owner:                      s.bridi
> > uid:                        65535
> > group:                      domusers
> > gid:                        15000
> > sge_o_home:                 /home/s.bridi
> > sge_o_log_name:             s.bridi
> > sge_o_path:
> >
> /sw/openmpi/142/bin:.:/ge/bin/linux-x64:/usr/lib64/qt-3.3/bin:/ge/bin/linux-x64:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/s.bridi/bin
> > sge_o_shell:                /bin/bash
> > sge_o_workdir:              /home/s.bridi/testexcl
> > sge_o_host:                 login0
> > account:                    sge
> > cwd:                        /home/s.bridi/testexcl
> > merge:                      y
> > hard resource_list:         exclusive=true
> > mail_list:                  s.bridi at login0
> > notify:                     FALSE
> > job_name:                   sleeper.sh
> > jobshare:                   0
> > hard_queue_list:            E5m
> > shell_list:                 NONE:/bin/bash
> > env_list:
> > script_file:                sleeper.sh
> > scheduling info:            [snip]
> >
> > while the
> > 'qstat -q E5m' don't list the job!
>
> Usually this means that the job is not allowed to run in this queue.
>
> What does:
>
> $ qalter -w v 876735
>
> ouput?
>
> -- Reuti
>
>
> > Thanks
> > Stefano
> > _______________________________________________
> > users mailing list
> > users at gridengine.org
> > https://gridengine.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20150505/9d64cc2a/attachment-0001.html>


More information about the users mailing list