[gridengine users] can't delete an exec host

Derrick Lin klin938 at gmail.com
Fri Sep 15 02:56:39 UTC 2017


Try

qconf -de host_list

Cheers,

On Thu, Sep 7, 2017 at 3:22 AM, Michael Stauffer <mgstauff at gmail.com> wrote:

> On Wed, Sep 6, 2017 at 12:42 PM, Reuti <reuti at staff.uni-marburg.de> wrote:
>
>>
>> > Am 06.09.2017 um 17:33 schrieb Michael Stauffer <mgstauff at gmail.com>:
>> >
>> > On Wed, Sep 6, 2017 at 11:16 AM, Feng Zhang <prod.feng at gmail.com>
>> wrote:
>> > It seems SGE master did not get refreshed with new hostgroup. Maybe you
>> can try:
>> >
>> > 1. restart SGE master
>> >
>> > Is it safe to do this with jobs queued and running? I think it's not
>> reliable, i.e. jobs can get killed and de-queued?
>>
>> Just to mention, that it's safe to restart the qmaster or reboot even the
>> machine the qmaster is running on. Nothing will happen to the running jobs
>> on the exechosts.
>>
>
> OK good to know. I've done that before and seen them finish, although some
> googling suggested people have seen jobs get killed. Does a qmaster
> restart, however, empty the queue? I imagine a reboot would too, unless the
> queue is stored in a file?
>
> -M
>
>
>>
>> -- Reuti
>>
>>
>> > or
>> >
>> > 2. change basic.q, "hostlist" to any node, like "compute-1-0.local",
>> > wait till it gets refreshed; then change it back to "@basichosts".
>> >
>> > I've done this, but it's not refreshing (been about 10 minutes now).
>> I'm still getting the error when I try to delete exec host compute-2-4, and
>> qhost is still showing basic.q on the nodes in @basichosts.
>> >
>> > Interestingly, host compute-2-4 was removed from another queue
>> (qlogin.basic.q) that also uses @basichosts, so it's something about
>> basic.q that's stuck.
>> >
>> > Is there some way to refresh things other than restarting qmaster?
>> >
>> > -M
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Sep 6, 2017 at 10:29 AM, Michael Stauffer <mgstauff at gmail.com>
>> wrote:
>> > > SoGE 8.1.8
>> > >
>> > > Hi,
>> > >
>> > > I'm having trouble deleting an execution host. I've removed it from
>> the
>> > > host group, but when I try to delete with qconf, it says it's still
>> part of
>> > > 'basic.q'. Here's the relevant output? Anyone have any suggestions?
>> > >
>> > > [root at chead ~]# qconf -de compute-2-4.local
>> > > Host object "compute-2-4.local" is still referenced in cluster queue
>> > > "basic.q".
>> > >
>> > > [root at chead ~]# qconf -sq basic.q
>> > > qname                 basic.q
>> > > hostlist              @basichosts
>> > > seq_no                0
>> > > load_thresholds       np_load_avg=1.74
>> > > suspend_thresholds    NONE
>> > > nsuspend              1
>> > > suspend_interval      00:05:00
>> > > priority              0
>> > > min_cpu_interval      00:05:00
>> > > processors            UNDEFINED
>> > > qtype                 BATCH
>> > > ckpt_list             NONE
>> > > pe_list               make mpich mpi orte unihost serial
>> > > rerun                 FALSE
>> > > slots                 8,[compute-1-2.local=3],[compute-1-0.local=7],
>> \
>> > >                       [compute-1-1.local=7],[compute-1-3.local=7], \
>> > >                       [compute-1-5.local=8],[compute-1-6.local=8], \
>> > >                       [compute-1-7.local=8],[compute-1-8.local=8], \
>> > >                       [compute-1-9.local=8],[compute-1-10.local=8], \
>> > >                       [compute-1-11.local=8],[compute-1-12.local=8],
>> \
>> > >                       [compute-1-13.local=8],[compute-1-14.local=8],
>> \
>> > >                       [compute-1-15.local=8]
>> > > tmpdir                /tmp
>> > > shell                 /bin/bash
>> > > prolog                NONE
>> > > epilog                NONE
>> > > shell_start_mode      posix_compliant
>> > > starter_method        NONE
>> > > suspend_method        NONE
>> > > resume_method         NONE
>> > > terminate_method      NONE
>> > > notify                00:00:60
>> > > owner_list            NONE
>> > > user_lists            NONE
>> > > xuser_lists           NONE
>> > > subordinate_list      NONE
>> > > complex_values        NONE
>> > > projects              NONE
>> > > xprojects             NONE
>> > > calendar              NONE
>> > > initial_state         default
>> > > s_rt                  INFINITY
>> > > h_rt                  INFINITY
>> > > s_cpu                 INFINITY
>> > > h_cpu                 INFINITY
>> > > s_fsize               INFINITY
>> > > h_fsize               INFINITY
>> > > s_data                INFINITY
>> > > h_data                INFINITY
>> > > s_stack               INFINITY
>> > > h_stack               INFINITY
>> > > s_core                INFINITY
>> > > h_core                INFINITY
>> > > s_rss                 INFINITY
>> > > h_rss                 INFINITY
>> > > s_vmem                19G
>> > > h_vmem                19G
>> > >
>> > > [root at chead ~]# qconf -shgrp @basichosts
>> > > group_name @basichosts
>> > > hostlist compute-1-0.local compute-1-2.local compute-1-3.local \
>> > >          compute-1-5.local compute-1-6.local compute-1-7.local \
>> > >          compute-1-8.local compute-1-9.local compute-1-10.local \
>> > >          compute-1-11.local compute-1-12.local compute-1-13.local \
>> > >          compute-1-14.local compute-1-15.local compute-2-0.local \
>> > >          compute-2-2.local compute-2-5.local compute-2-7.local \
>> > >          compute-2-8.local compute-2-9.local compute-2-11.local \
>> > >          compute-2-12.local compute-2-13.local compute-2-15.local \
>> > >          compute-2-6.local
>> > >
>> > > Thanks
>> > >
>> > > -M
>> > >
>> > > _______________________________________________
>> > > users mailing list
>> > > users at gridengine.org
>> > > https://gridengine.org/mailman/listinfo/users
>> > >
>> >
>> >
>> >
>> > --
>> > Best,
>> >
>> > Feng
>> >
>> > _______________________________________________
>> > users mailing list
>> > users at gridengine.org
>> > https://gridengine.org/mailman/listinfo/users
>>
>>
>
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20170915/e285335c/attachment.html>


More information about the users mailing list