[gridengine users] running job holds and restart

Sangmin Park dorimosiada at gmail.com
Mon Oct 28 12:59:39 UTC 2013


yes, suspending the job when all 12 slots are used on a particular host.
This is what I want to.
So, I tried to submit job using 12 slots, but it did not work.
Still not working..

--Sangmin


On Mon, Oct 28, 2013 at 9:47 PM, Reuti <reuti at staff.uni-marburg.de> wrote:

> Am 28.10.2013 um 13:45 schrieb Sangmin Park:
>
> > This is the RQS
> >
> >    limit        hosts {@parallelhosts} to slots=$num_proc
> >    limit        queues !matlab.q hosts {@matlabhosts} to slots=$num_proc
> > parallelhosts include matlabhosts.
> >
> > slots value in the matlab.q means the number of cores per node.
> >
> > All hosts is included in parallelhosts, node1 ~ node30.
> > matlabhosts include node1 ~ node7.
> > short.q, normal.q and long.q could be used in node1 ~ node7.
> >
> > I want to set up when jobs with short.q, normal.q and long.q are
> running, if matlab job is submitted,
> > running job not using matlab.q in node1 ~ node7 is suspended and matlab
> job is run.
> > This is what I want to set up.
> >
> > I don't understand why it can not be happened if I setup slots value 12.
>
> It will suspend the job when all 12 slots are used on a particular host.
> You may want to try with 1 instead. As s refinement, you could also look
> into slotwise subordination.
>
> -- Reuti
>
>
> > --Sangmin
> >
> >
> > On Mon, Oct 28, 2013 at 8:58 PM, Reuti <reuti at staff.uni-marburg.de>
> wrote:
> > Am 28.10.2013 um 12:30 schrieb Sangmin Park:
> >
> > > I've edit the negative value in the priority section, short.q is 4,
> normal.q is 6 and long.q is 8, respectively.
> > > And I configured 72 cores for each queues.
> >
> > But you didn't answer the question: How do you limit the overall slot
> count? RQS oder definition in the exechost?
> >
> > > Below is matlab.q instance details.
> > > qname                 matlab.q
> > > hostlist              @matlabhosts
> > > seq_no                0
> > > load_thresholds       np_load_avg=1.75
> > > suspend_thresholds    NONE
> > > nsuspend              1
> > > suspend_interval      00:05:00
> > > priority              2
> > > min_cpu_interval      00:05:00
> > > processors            UNDEFINED
> > > qtype                 BATCH INTERACTIVE
> > > ckpt_list             NONE
> > > pe_list               fill_up make matlab
> > > rerun                 FALSE
> > > slots                 12
> > > tmpdir                /tmp
> > > shell                 /bin/bash
> > > prolog                NONE
> > > epilog                NONE
> > > shell_start_mode      posix_compliant
> > > starter_method        NONE
> > > suspend_method        NONE
> > > resume_method         NONE
> > > terminate_method      NONE
> > > notify                00:00:60
> > > owner_list            NONE
> > > user_lists            octausers onsiteusers
> > > xuser_lists           NONE
> > > subordinate_list      short.q=72, normal.q=72, long.q=72
> >
> > This will suspend these tree queues when 72 slots per queue instance in
> matlab.q is used. As you have only 12 defined above, this will never happen.
> >
> > What behavior would you like to set up?
> >
> > -- Reuti
> >
> >
> > > complex_values        NONE
> > > projects              NONE
> > > xprojects             NONE
> > > calendar              NONE
> > > initial_state         default
> > > s_rt                  INFINITY
> > > h_rt                  168:00:00
> > > s_cpu                 INFINITY
> > > h_cpu                 INFINITY
> > > s_fsize               INFINITY
> > > h_fsize               INFINITY
> > > s_data                INFINITY
> > > h_data                INFINITY
> > > s_stack               INFINITY
> > > h_stack               INFINITY
> > > s_core                INFINITY
> > > h_core                INFINITY
> > > s_rss                 INFINITY
> > > h_rss                 INFINITY
> > > s_vmem                INFINITY
> > > h_vmem                INFINITY
> > >
> > > thanks,
> > >
> > > --Sangmin
> > >
> > >
> > > On Mon, Oct 28, 2013 at 3:51 PM, Reuti <reuti at staff.uni-marburg.de>
> wrote:
> > > Hi,
> > >
> > > Am 28.10.2013 um 06:40 schrieb Sangmin Park:
> > >
> > > > Thanks, adam
> > > >
> > > > I configured sge queue configuration following second link you said.
> > > > But, it does not work.
> > > >
> > > > I make 4 queues, short.q, normal.q, long.q and matlab.q
> > > > short.q, normal.q and long.q queue instances are running all
> computing nodes, node1 ~ node30.
> > > > matlab.q instance is configured only for a few nodes, node1 ~ node7,
> called matlabhosts
> > > >
> > > > The priorities of each queue is below.
> > > > [short.q]
> > > > priority              -5
> > >
> > > Don't use negative values here. This number is the "nice value" under
> which the Linux kernel will run the process (i.e. the scheduler in the
> kernel, for SGE it doesn't influence the scheduling). User processes should
> be in the range 0..19 [20 on Solaris]. The negative ones are reserved for
> kernel processes.
> > >
> > >
> > > > subordinate_list      NONE
> > > > [normal.q]
> > > > priority              0
> > > > subordinate_list      NONE
> > > > [long.q]
> > > > priority              5
> > > > subordinate_list      NONE
> > > >
> > > > and matlab.q is
> > > > priority              -10
> > > > subordinate_list      short.q normal.q long.q
> > >
> > > Same here. It's also worth to note, that these values are relative.
> I.e. having the same number of user processes and cores, it doesn't matter
> which values are used as nice values, as each process gets it's own core
> anyway. Only when there are more processes than cores it will have an
> effect. But as these are relative values, it's the same whether (cores+1)
> processes have all 0 or 19 as nice value.
> > >
> > >
> > > > I submited several jobs using normal.q to the matlabhosts
> > > > and I submited a job using matlab.q that has subordinate_list
> > > > I expected one of normal.q queue job is suspended and matlab.q queue
> job is running.
> > > > But, matlab.q queue job waits in queue with status qw. not submitted.
> > > >
> > > > what's the matter with this?
> > > > please help!!
> > >
> > > http://gridengine.org/pipermail/users/2013-October/006820.html
> > >
> > > How do you limit the overall slot count?
> > >
> > > -- Reuti
> > >
> > >
> > > > Sangmin
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Oct 15, 2013 at 3:50 PM, Adam Brenner <aebrenne at uci.edu>
> wrote:
> > > > Sangmin,
> > > >
> > > > I believe the phrase / term you are looking for is Subordinate
> > > > Queues[1][2]. This should handle what you are looking for.
> > > >
> > > > If not ... I am sure Reuti (or someone else) will correct me on this.
> > > >
> > > > Enjoy,
> > > > -Adam
> > > >
> > > > [1]: http://docs.oracle.com/cd/E19957-01/820-0698/i998889/index.html
> > > > [2]:
> http://grid-gurus.blogspot.com/2011/03/using-grid-engine-subordinate-queues.html
> > > >
> > > > --
> > > > Adam Brenner
> > > > Computer Science, Undergraduate Student
> > > > Donald Bren School of Information and Computer Sciences
> > > >
> > > > Research Computing Support
> > > > Office of Information Technology
> > > > http://www.oit.uci.edu/rcs/
> > > >
> > > > University of California, Irvine
> > > > www.ics.uci.edu/~aebrenne/
> > > > aebrenne at uci.edu
> > > >
> > > >
> > > > On Mon, Oct 14, 2013 at 11:18 PM, Sangmin Park <
> dorimosiada at gmail.com> wrote:
> > > > > Howdy,
> > > > >
> > > > > For specific purpose in my organization,
> > > > > I want to configure something to SGE scheduler.
> > > > >
> > > > > Imazine.
> > > > > a job is running, called A-job.
> > > > > If B-job is submitted during A-job is running,
> > > > > I want to hold A-job and run B-job first.
> > > > > And after B-job is finished, restart A-job.
> > > > >
> > > > > What do I do for this?
> > > > >
> > > > > Sangmin
> > > > >
> > > > > --
> > > > > ===========================
> > > > > Sangmin Park
> > > > > Supercomputing Center
> > > > > Ulsan National Institute of Science and Technology(UNIST)
> > > > > Ulsan, 689-798, Korea
> > > > >
> > > > > phone : +82-52-217-4201
> > > > > mobile : +82-10-5094-0405
> > > > > fax : +82-52-217-4209
> > > > > ===========================
> > > > >
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > users at gridengine.org
> > > > > https://gridengine.org/mailman/listinfo/users
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > ===========================
> > > > Sangmin Park
> > > > Supercomputing Center
> > > > Ulsan National Institute of Science and Technology(UNIST)
> > > > Ulsan, 689-798, Korea
> > > >
> > > > phone : +82-52-217-4201
> > > > mobile : +82-10-5094-0405
> > > > fax : +82-52-217-4209
> > > > ===========================
> > > > _______________________________________________
> > > > users mailing list
> > > > users at gridengine.org
> > > > https://gridengine.org/mailman/listinfo/users
> > >
> > >
> > >
> > >
> > > --
> > > ===========================
> > > Sangmin Park
> > > Supercomputing Center
> > > Ulsan National Institute of Science and Technology(UNIST)
> > > Ulsan, 689-798, Korea
> > >
> > > phone : +82-52-217-4201
> > > mobile : +82-10-5094-0405
> > > fax : +82-52-217-4209
> > > ===========================
> >
> >
> >
> >
> > --
> > ===========================
> > Sangmin Park
> > Supercomputing Center
> > Ulsan National Institute of Science and Technology(UNIST)
> > Ulsan, 689-798, Korea
> >
> > phone : +82-52-217-4201
> > mobile : +82-10-5094-0405
> > fax : +82-52-217-4209
> > ===========================
>
>


-- 
===========================
Sangmin Park
Supercomputing Center
Ulsan National Institute of Science and Technology(UNIST)
Ulsan, 689-798, Korea

phone : +82-52-217-4201
mobile : +82-10-5094-0405
fax : +82-52-217-4209
===========================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20131028/b9d52c0e/attachment.html>


More information about the users mailing list