[gridengine users] Request: Value of -p in qsub doesn't go below -26

Gowtham g at mtu.edu
Mon Aug 6 23:44:03 UTC 2018


Additionally, when I ran 'qstat -j', the list of pending jobs do not seem
to contain these priority -27 or -28 jobs.

*****************
Jobs can not run because queue instance is not contained in its hard queue
list
481017, 481018, 481020, 481021, 481022, 481024, 481214, 481251,
481297, 481298, 481299, 481329, 481366, 481370, 481405, 482154,
481406, 481407, 481408, 481409, 481410, 481411, 481412, 481413,
481860, 482162

Jobs can not run because available slots combined under PE are not in range
of job
481017, 481018, 481020, 481021, 481022, 481024, 481214, 481251,
481297, 481298, 481299, 481329, 481366, 481370, 481405, 482154,
481406, 481407, 481408, 481409, 481410, 481411, 481412, 481413,
481860, 482162
*****************

Best regards,
Gowtham

--
Gowtham, PhD
Director of Research Computing, IT
Research Associate Professor, ECE
Michigan Technological University

P: (906) 487-4096
F: (906) 487-2787
https://it.mtu.edu
https://hpc.mtu.edu


On Mon, Aug 6, 2018 at 7:37 PM Gowtham <g at mtu.edu> wrote:

> Thank you, Reuti.
>
> report_pjob_tickets was already set to TRUE in 'qconf -msconf'.
>
> Output of qstat -ext and qstat -pri is attached as a screenshot.
>
> I have verified the following a few times:
>
>    1. There are sufficient free slots/processors available in the queue
>    2. There is sufficient men_free resource available to start these
>    waiting simulations
>    3. When I submit a p = -26 simulation while the -27 and -28 are
>    waiting, -26 (or lower) will still run to completion successfully
>
> Anymore thoughts/tips would be greatly appreciated.
>
> Best regards,
> Gowtham
>
> --
> Gowtham, PhD
> Director of Research Computing, IT
> Research Associate Professor, ECE
> Michigan Technological University
>
> P: (906) 487-4096
> F: (906) 487-2787
> https://it.mtu.edu
> https://hpc.mtu.edu
>
>
> On Mon, Aug 6, 2018 at 4:50 PM Reuti <reuti at staff.uni-marburg.de> wrote:
>
>> Hi,
>>
>> You can try to have a look at the extended output of `qstat`:
>>
>> $ qstat -ext
>>
>> $ qstat -pri
>>
>> In addition, the way the priority is honored and essentially computed is
>> outlined here:
>>
>> $ man sge_priority
>>
>> Maybe this will shed some light on it and point to the cause of it.
>>
>> -- Reuti
>>
>> PS: You may also want to switch on the output of the computed tickets:
>>
>> $ qconf -ssconf
>>>> report_pjob_tickets               TRUE
>>
>>
>> Am 06.08.2018 um 19:18 schrieb Gowtham:
>>
>> > Greetings.
>> >
>> > I am using Rocks Cluster Distribution 6.1 and Grid Engine 2011.11p1.
>> All our simulations are submitted to the queue using the following command
>> format:
>> >
>> > qsub -p N SUBMISSION_SCRIPT.sh
>> >
>> > N is a negative integer ranging from -1 through -60 (we consider this
>> the "priority" of a research group).
>> >
>> > Until about a week or so ago, everything worked fine. Upon noticing
>> some simulations waiting in queue for longer than normal periods of time
>> (for e.g., my own group's priority is -41), I submitted 60 simulations with
>> priority values -1, -2, -3, ..., -60.
>> >
>> > I noticed that simulations with priority up to -26 ran just fine. Those
>> with -p value -27 and below just stay in 'qw' mode. The usual 'qstat -j
>> SIM_ID' command does not have information as to why it's not running
>> (please see below the output for a simulation with priority -27).
>> Processors/slots are free and available in long.q.
>> >
>> > As far as I know and understand Grid Engine documentation, -p values
>> range from -1024 through 1023 and non operators/admins are restricted to 0
>> through -1024.
>> >
>> > Any help in debugging/identifying the cause of this problem will be
>> greatly appreciated.
>> >
>> >
>> ****************************************************************************************
>> > job_number:                 481703
>> > exec_file:                  job_scripts/481703
>> > submission_time:            Mon Aug  6 12:48:07 2018
>> > owner:                      john
>> > uid:                        38025
>> > group:                      jane-users
>> > gid:                        506
>> > sge_o_home:                 /home/john
>> > sge_o_log_name:             john
>> > sge_o_path:
>>  :/bin:/usr/bin:/usr/kerberos/bin:/share/apps/bin:/share/apps/sbin:/usr/X11R6/bin:/usr/java/latest/bin:/sbin:/usr/sbin:/usr/kerberos/sbin:/opt/gridengine/bin/lx26-amd64:/opt/gridengine/bin/linux-x64:/home/john/bin:/opt/ganglia/bin:/opt/rocks/bin:/opt/rocks/sbin
>> > sge_o_shell:                /bin/bash
>> > sge_o_tz:                   America/Detroit
>> > sge_o_workdir:              /misc/research/john/test_runs
>> > sge_o_host:                 login-0-2
>> > account:                    sge
>> > cwd:                        /misc/research/john/test_runs
>> > merge:                      y
>> > hard resource_list:         mem_free=2G
>> > mail_list:                  john at login-0-1.local
>> > notify:                     TRUE
>> > job_name:                   test_p27.sh
>> > priority:                   -27
>> > jobshare:                   0
>> > hard_queue_list:            long.q
>> > shell_list:                 NONE:/bin/bash
>> > env_list:
>> > script_file:                test_p27.sh
>> > scheduling info:            queue instance "long.q at compute-0-48.local"
>> dropped because it is disabled
>> >                             queue instance "long.q at compute-0-66.local"
>> dropped because it is disabled
>> >                             queue instance "long.q at compute-0-65.local"
>> dropped because it is disabled
>> >                             queue instance "long.q at compute-0-20.local"
>> dropped because it is disabled
>> >                             queue instance "long.q at compute-0-64.local"
>> dropped because it is disabled
>> >                             queue instance "repair.q at compute-0-36.local"
>> dropped because it is disabled
>> >                             queue instance "long.q at compute-0-63.local"
>> dropped because it is full
>> >                             queue instance "long.q at compute-0-50.local"
>> dropped because it is full
>> >                             ...
>> >                             queue instance "long.q at compute-0-33.local"
>> dropped because it is full
>> >                             queue instance "long.q at compute-0-31.local"
>> dropped because it is full
>> >                             queue instance "long.q at compute-0-35.local"
>> dropped because it is full
>> >                             queue instance "long.q at compute-0-10.local"
>> dropped because it is full
>> >                             queue instance "long.q at compute-0-43.local"
>> dropped because it is full
>> >                             queue instance "short.q at compute-0-1.local"
>> dropped because it is full
>> >                             queue instance "short.q at compute-0-2.local"
>> dropped because it is full
>> >                             queue instance "short.q at compute-0-3.local"
>> dropped because it is full
>> >                             queue instance "short.q at compute-0-0.local"
>> dropped because it is full
>> >                             queue instance "medium.q at compute-0-6.local"
>> dropped because it is full
>> >                             queue instance "medium.q at compute-0-7.local"
>> dropped because it is full
>> >                             queue instance "medium.q at compute-0-5.local"
>> dropped because it is full
>> >                             queue instance "medium.q at compute-0-4.local"
>> dropped because it is full
>> >
>> ****************************************************************************************
>> >
>> >
>> > Best regards,
>> > Gowtham
>> >
>> > --
>> > Gowtham, PhD
>> > Director of Research Computing, IT
>> > Research Associate Professor, ECE
>> > Michigan Technological University
>> >
>> > P: (906) 487-4096
>> > F: (906) 487-2787
>> > https://it.mtu.edu
>> > https://hpc.mtu.edu
>> > _______________________________________________
>> > users mailing list
>> > users at gridengine.org
>> > https://gridengine.org/mailman/listinfo/users
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20180806/49f4eef0/attachment.html>


More information about the users mailing list