[gridengine users] can't delete an exec host

Michael Stauffer mgstauff at gmail.com
Wed Sep 6 15:33:37 UTC 2017


On Wed, Sep 6, 2017 at 11:16 AM, Feng Zhang <prod.feng at gmail.com> wrote:

> It seems SGE master did not get refreshed with new hostgroup. Maybe you
> can try:
>
> 1. restart SGE master
>

Is it safe to do this with jobs queued and running? I think it's not
reliable, i.e. jobs can get killed and de-queued?


> or
>
> 2. change basic.q, "hostlist" to any node, like "compute-1-0.local",

wait till it gets refreshed; then change it back to "@basichosts".
>

I've done this, but it's not refreshing (been about 10 minutes now). I'm
still getting the error when I try to delete exec host compute-2-4, and
qhost is still showing basic.q on the nodes in @basichosts.

Interestingly, host compute-2-4 was removed from another queue
(qlogin.basic.q) that also uses @basichosts, so it's something about
basic.q that's stuck.

Is there some way to refresh things other than restarting qmaster?

-M



>
>
>
> On Wed, Sep 6, 2017 at 10:29 AM, Michael Stauffer <mgstauff at gmail.com>
> wrote:
> > SoGE 8.1.8
> >
> > Hi,
> >
> > I'm having trouble deleting an execution host. I've removed it from the
> > host group, but when I try to delete with qconf, it says it's still part
> of
> > 'basic.q'. Here's the relevant output? Anyone have any suggestions?
> >
> > [root at chead ~]# qconf -de compute-2-4.local
> > Host object "compute-2-4.local" is still referenced in cluster queue
> > "basic.q".
> >
> > [root at chead ~]# qconf -sq basic.q
> > qname                 basic.q
> > hostlist              @basichosts
> > seq_no                0
> > load_thresholds       np_load_avg=1.74
> > suspend_thresholds    NONE
> > nsuspend              1
> > suspend_interval      00:05:00
> > priority              0
> > min_cpu_interval      00:05:00
> > processors            UNDEFINED
> > qtype                 BATCH
> > ckpt_list             NONE
> > pe_list               make mpich mpi orte unihost serial
> > rerun                 FALSE
> > slots                 8,[compute-1-2.local=3],[compute-1-0.local=7], \
> >                       [compute-1-1.local=7],[compute-1-3.local=7], \
> >                       [compute-1-5.local=8],[compute-1-6.local=8], \
> >                       [compute-1-7.local=8],[compute-1-8.local=8], \
> >                       [compute-1-9.local=8],[compute-1-10.local=8], \
> >                       [compute-1-11.local=8],[compute-1-12.local=8], \
> >                       [compute-1-13.local=8],[compute-1-14.local=8], \
> >                       [compute-1-15.local=8]
> > tmpdir                /tmp
> > shell                 /bin/bash
> > prolog                NONE
> > epilog                NONE
> > shell_start_mode      posix_compliant
> > starter_method        NONE
> > suspend_method        NONE
> > resume_method         NONE
> > terminate_method      NONE
> > notify                00:00:60
> > owner_list            NONE
> > user_lists            NONE
> > xuser_lists           NONE
> > subordinate_list      NONE
> > complex_values        NONE
> > projects              NONE
> > xprojects             NONE
> > calendar              NONE
> > initial_state         default
> > s_rt                  INFINITY
> > h_rt                  INFINITY
> > s_cpu                 INFINITY
> > h_cpu                 INFINITY
> > s_fsize               INFINITY
> > h_fsize               INFINITY
> > s_data                INFINITY
> > h_data                INFINITY
> > s_stack               INFINITY
> > h_stack               INFINITY
> > s_core                INFINITY
> > h_core                INFINITY
> > s_rss                 INFINITY
> > h_rss                 INFINITY
> > s_vmem                19G
> > h_vmem                19G
> >
> > [root at chead ~]# qconf -shgrp @basichosts
> > group_name @basichosts
> > hostlist compute-1-0.local compute-1-2.local compute-1-3.local \
> >          compute-1-5.local compute-1-6.local compute-1-7.local \
> >          compute-1-8.local compute-1-9.local compute-1-10.local \
> >          compute-1-11.local compute-1-12.local compute-1-13.local \
> >          compute-1-14.local compute-1-15.local compute-2-0.local \
> >          compute-2-2.local compute-2-5.local compute-2-7.local \
> >          compute-2-8.local compute-2-9.local compute-2-11.local \
> >          compute-2-12.local compute-2-13.local compute-2-15.local \
> >          compute-2-6.local
> >
> > Thanks
> >
> > -M
> >
> > _______________________________________________
> > users mailing list
> > users at gridengine.org
> > https://gridengine.org/mailman/listinfo/users
> >
>
>
>
> --
> Best,
>
> Feng
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20170906/183e991a/attachment.html>


More information about the users mailing list