[gridengine users] Request: Value of -p in qsub doesn't go below -26

Reuti reuti at staff.uni-marburg.de
Mon Aug 6 20:50:18 UTC 2018


Hi,

You can try to have a look at the extended output of `qstat`:

$ qstat -ext

$ qstat -pri

In addition, the way the priority is honored and essentially computed is outlined here:

$ man sge_priority

Maybe this will shed some light on it and point to the cause of it.

-- Reuti

PS: You may also want to switch on the output of the computed tickets:

$ qconf -ssconf
…
report_pjob_tickets               TRUE


Am 06.08.2018 um 19:18 schrieb Gowtham:

> Greetings.
> 
> I am using Rocks Cluster Distribution 6.1 and Grid Engine 2011.11p1. All our simulations are submitted to the queue using the following command format:
> 
> qsub -p N SUBMISSION_SCRIPT.sh
> 
> N is a negative integer ranging from -1 through -60 (we consider this the "priority" of a research group). 
> 
> Until about a week or so ago, everything worked fine. Upon noticing some simulations waiting in queue for longer than normal periods of time (for e.g., my own group's priority is -41), I submitted 60 simulations with priority values -1, -2, -3, ..., -60.
> 
> I noticed that simulations with priority up to -26 ran just fine. Those with -p value -27 and below just stay in 'qw' mode. The usual 'qstat -j SIM_ID' command does not have information as to why it's not running (please see below the output for a simulation with priority -27). Processors/slots are free and available in long.q.
> 
> As far as I know and understand Grid Engine documentation, -p values range from -1024 through 1023 and non operators/admins are restricted to 0 through -1024. 
> 
> Any help in debugging/identifying the cause of this problem will be greatly appreciated.
> 
> ****************************************************************************************
> job_number:                 481703
> exec_file:                  job_scripts/481703
> submission_time:            Mon Aug  6 12:48:07 2018
> owner:                      john
> uid:                        38025
> group:                      jane-users
> gid:                        506
> sge_o_home:                 /home/john
> sge_o_log_name:             john
> sge_o_path:                 :/bin:/usr/bin:/usr/kerberos/bin:/share/apps/bin:/share/apps/sbin:/usr/X11R6/bin:/usr/java/latest/bin:/sbin:/usr/sbin:/usr/kerberos/sbin:/opt/gridengine/bin/lx26-amd64:/opt/gridengine/bin/linux-x64:/home/john/bin:/opt/ganglia/bin:/opt/rocks/bin:/opt/rocks/sbin
> sge_o_shell:                /bin/bash
> sge_o_tz:                   America/Detroit
> sge_o_workdir:              /misc/research/john/test_runs
> sge_o_host:                 login-0-2
> account:                    sge
> cwd:                        /misc/research/john/test_runs
> merge:                      y
> hard resource_list:         mem_free=2G
> mail_list:                  john at login-0-1.local
> notify:                     TRUE
> job_name:                   test_p27.sh
> priority:                   -27
> jobshare:                   0
> hard_queue_list:            long.q
> shell_list:                 NONE:/bin/bash
> env_list:                   
> script_file:                test_p27.sh
> scheduling info:            queue instance "long.q at compute-0-48.local" dropped because it is disabled
>                             queue instance "long.q at compute-0-66.local" dropped because it is disabled
>                             queue instance "long.q at compute-0-65.local" dropped because it is disabled
>                             queue instance "long.q at compute-0-20.local" dropped because it is disabled
>                             queue instance "long.q at compute-0-64.local" dropped because it is disabled
>                             queue instance "repair.q at compute-0-36.local" dropped because it is disabled
>                             queue instance "long.q at compute-0-63.local" dropped because it is full
>                             queue instance "long.q at compute-0-50.local" dropped because it is full
>                             ...
>                             queue instance "long.q at compute-0-33.local" dropped because it is full
>                             queue instance "long.q at compute-0-31.local" dropped because it is full
>                             queue instance "long.q at compute-0-35.local" dropped because it is full
>                             queue instance "long.q at compute-0-10.local" dropped because it is full
>                             queue instance "long.q at compute-0-43.local" dropped because it is full
>                             queue instance "short.q at compute-0-1.local" dropped because it is full
>                             queue instance "short.q at compute-0-2.local" dropped because it is full
>                             queue instance "short.q at compute-0-3.local" dropped because it is full
>                             queue instance "short.q at compute-0-0.local" dropped because it is full
>                             queue instance "medium.q at compute-0-6.local" dropped because it is full
>                             queue instance "medium.q at compute-0-7.local" dropped because it is full
>                             queue instance "medium.q at compute-0-5.local" dropped because it is full
>                             queue instance "medium.q at compute-0-4.local" dropped because it is full
> ****************************************************************************************
> 
> 
> Best regards,
> Gowtham
> 
> --
> Gowtham, PhD
> Director of Research Computing, IT
> Research Associate Professor, ECE
> Michigan Technological University
> 
> P: (906) 487-4096
> F: (906) 487-2787
> https://it.mtu.edu
> https://hpc.mtu.edu
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users





More information about the users mailing list