[gridengine users] Request: Value of -p in qsub doesn't go below -26

Gowtham g at mtu.edu
Mon Aug 6 23:37:13 UTC 2018


Thank you, Reuti.

report_pjob_tickets was already set to TRUE in 'qconf -msconf'.

Output of qstat -ext and qstat -pri is attached as a screenshot.

I have verified the following a few times:

   1. There are sufficient free slots/processors available in the queue
   2. There is sufficient men_free resource available to start these
   waiting simulations
   3. When I submit a p = -26 simulation while the -27 and -28 are waiting,
   -26 (or lower) will still run to completion successfully

Anymore thoughts/tips would be greatly appreciated.

Best regards,
Gowtham

--
Gowtham, PhD
Director of Research Computing, IT
Research Associate Professor, ECE
Michigan Technological University

P: (906) 487-4096
F: (906) 487-2787
https://it.mtu.edu
https://hpc.mtu.edu


On Mon, Aug 6, 2018 at 4:50 PM Reuti <reuti at staff.uni-marburg.de> wrote:

> Hi,
>
> You can try to have a look at the extended output of `qstat`:
>
> $ qstat -ext
>
> $ qstat -pri
>
> In addition, the way the priority is honored and essentially computed is
> outlined here:
>
> $ man sge_priority
>
> Maybe this will shed some light on it and point to the cause of it.
>
> -- Reuti
>
> PS: You may also want to switch on the output of the computed tickets:
>
> $ qconf -ssconf
>> report_pjob_tickets               TRUE
>
>
> Am 06.08.2018 um 19:18 schrieb Gowtham:
>
> > Greetings.
> >
> > I am using Rocks Cluster Distribution 6.1 and Grid Engine 2011.11p1. All
> our simulations are submitted to the queue using the following command
> format:
> >
> > qsub -p N SUBMISSION_SCRIPT.sh
> >
> > N is a negative integer ranging from -1 through -60 (we consider this
> the "priority" of a research group).
> >
> > Until about a week or so ago, everything worked fine. Upon noticing some
> simulations waiting in queue for longer than normal periods of time (for
> e.g., my own group's priority is -41), I submitted 60 simulations with
> priority values -1, -2, -3, ..., -60.
> >
> > I noticed that simulations with priority up to -26 ran just fine. Those
> with -p value -27 and below just stay in 'qw' mode. The usual 'qstat -j
> SIM_ID' command does not have information as to why it's not running
> (please see below the output for a simulation with priority -27).
> Processors/slots are free and available in long.q.
> >
> > As far as I know and understand Grid Engine documentation, -p values
> range from -1024 through 1023 and non operators/admins are restricted to 0
> through -1024.
> >
> > Any help in debugging/identifying the cause of this problem will be
> greatly appreciated.
> >
> >
> ****************************************************************************************
> > job_number:                 481703
> > exec_file:                  job_scripts/481703
> > submission_time:            Mon Aug  6 12:48:07 2018
> > owner:                      john
> > uid:                        38025
> > group:                      jane-users
> > gid:                        506
> > sge_o_home:                 /home/john
> > sge_o_log_name:             john
> > sge_o_path:
>  :/bin:/usr/bin:/usr/kerberos/bin:/share/apps/bin:/share/apps/sbin:/usr/X11R6/bin:/usr/java/latest/bin:/sbin:/usr/sbin:/usr/kerberos/sbin:/opt/gridengine/bin/lx26-amd64:/opt/gridengine/bin/linux-x64:/home/john/bin:/opt/ganglia/bin:/opt/rocks/bin:/opt/rocks/sbin
> > sge_o_shell:                /bin/bash
> > sge_o_tz:                   America/Detroit
> > sge_o_workdir:              /misc/research/john/test_runs
> > sge_o_host:                 login-0-2
> > account:                    sge
> > cwd:                        /misc/research/john/test_runs
> > merge:                      y
> > hard resource_list:         mem_free=2G
> > mail_list:                  john at login-0-1.local
> > notify:                     TRUE
> > job_name:                   test_p27.sh
> > priority:                   -27
> > jobshare:                   0
> > hard_queue_list:            long.q
> > shell_list:                 NONE:/bin/bash
> > env_list:
> > script_file:                test_p27.sh
> > scheduling info:            queue instance "long.q at compute-0-48.local"
> dropped because it is disabled
> >                             queue instance "long.q at compute-0-66.local"
> dropped because it is disabled
> >                             queue instance "long.q at compute-0-65.local"
> dropped because it is disabled
> >                             queue instance "long.q at compute-0-20.local"
> dropped because it is disabled
> >                             queue instance "long.q at compute-0-64.local"
> dropped because it is disabled
> >                             queue instance "repair.q at compute-0-36.local"
> dropped because it is disabled
> >                             queue instance "long.q at compute-0-63.local"
> dropped because it is full
> >                             queue instance "long.q at compute-0-50.local"
> dropped because it is full
> >                             ...
> >                             queue instance "long.q at compute-0-33.local"
> dropped because it is full
> >                             queue instance "long.q at compute-0-31.local"
> dropped because it is full
> >                             queue instance "long.q at compute-0-35.local"
> dropped because it is full
> >                             queue instance "long.q at compute-0-10.local"
> dropped because it is full
> >                             queue instance "long.q at compute-0-43.local"
> dropped because it is full
> >                             queue instance "short.q at compute-0-1.local"
> dropped because it is full
> >                             queue instance "short.q at compute-0-2.local"
> dropped because it is full
> >                             queue instance "short.q at compute-0-3.local"
> dropped because it is full
> >                             queue instance "short.q at compute-0-0.local"
> dropped because it is full
> >                             queue instance "medium.q at compute-0-6.local"
> dropped because it is full
> >                             queue instance "medium.q at compute-0-7.local"
> dropped because it is full
> >                             queue instance "medium.q at compute-0-5.local"
> dropped because it is full
> >                             queue instance "medium.q at compute-0-4.local"
> dropped because it is full
> >
> ****************************************************************************************
> >
> >
> > Best regards,
> > Gowtham
> >
> > --
> > Gowtham, PhD
> > Director of Research Computing, IT
> > Research Associate Professor, ECE
> > Michigan Technological University
> >
> > P: (906) 487-4096
> > F: (906) 487-2787
> > https://it.mtu.edu
> > https://hpc.mtu.edu
> > _______________________________________________
> > users mailing list
> > users at gridengine.org
> > https://gridengine.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20180806/319da96e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: qstat_ext_pri.png
Type: image/png
Size: 234288 bytes
Desc: not available
URL: <http://gridengine.org/pipermail/users/attachments/20180806/319da96e/attachment.png>


More information about the users mailing list