[gridengine users] can't delete an exec host

Feng Zhang prod.feng at gmail.com
Wed Sep 6 15:41:32 UTC 2017


Is there any running jobs on queue instance of compute-2-4 at basic.q?

On Wed, Sep 6, 2017 at 11:33 AM, Michael Stauffer <mgstauff at gmail.com> wrote:
> On Wed, Sep 6, 2017 at 11:16 AM, Feng Zhang <prod.feng at gmail.com> wrote:
>>
>> It seems SGE master did not get refreshed with new hostgroup. Maybe you
>> can try:
>>
>> 1. restart SGE master
>
>
> Is it safe to do this with jobs queued and running? I think it's not
> reliable, i.e. jobs can get killed and de-queued?
>
>>
>> or
>>
>> 2. change basic.q, "hostlist" to any node, like "compute-1-0.local",
>>
>> wait till it gets refreshed; then change it back to "@basichosts".
>
>
> I've done this, but it's not refreshing (been about 10 minutes now). I'm
> still getting the error when I try to delete exec host compute-2-4, and
> qhost is still showing basic.q on the nodes in @basichosts.
>
> Interestingly, host compute-2-4 was removed from another queue
> (qlogin.basic.q) that also uses @basichosts, so it's something about basic.q
> that's stuck.
>
> Is there some way to refresh things other than restarting qmaster?
>
> -M
>
>
>>
>>
>>
>>
>> On Wed, Sep 6, 2017 at 10:29 AM, Michael Stauffer <mgstauff at gmail.com>
>> wrote:
>> > SoGE 8.1.8
>> >
>> > Hi,
>> >
>> > I'm having trouble deleting an execution host. I've removed it from the
>> > host group, but when I try to delete with qconf, it says it's still part
>> > of
>> > 'basic.q'. Here's the relevant output? Anyone have any suggestions?
>> >
>> > [root at chead ~]# qconf -de compute-2-4.local
>> > Host object "compute-2-4.local" is still referenced in cluster queue
>> > "basic.q".
>> >
>> > [root at chead ~]# qconf -sq basic.q
>> > qname                 basic.q
>> > hostlist              @basichosts
>> > seq_no                0
>> > load_thresholds       np_load_avg=1.74
>> > suspend_thresholds    NONE
>> > nsuspend              1
>> > suspend_interval      00:05:00
>> > priority              0
>> > min_cpu_interval      00:05:00
>> > processors            UNDEFINED
>> > qtype                 BATCH
>> > ckpt_list             NONE
>> > pe_list               make mpich mpi orte unihost serial
>> > rerun                 FALSE
>> > slots                 8,[compute-1-2.local=3],[compute-1-0.local=7], \
>> >                       [compute-1-1.local=7],[compute-1-3.local=7], \
>> >                       [compute-1-5.local=8],[compute-1-6.local=8], \
>> >                       [compute-1-7.local=8],[compute-1-8.local=8], \
>> >                       [compute-1-9.local=8],[compute-1-10.local=8], \
>> >                       [compute-1-11.local=8],[compute-1-12.local=8], \
>> >                       [compute-1-13.local=8],[compute-1-14.local=8], \
>> >                       [compute-1-15.local=8]
>> > tmpdir                /tmp
>> > shell                 /bin/bash
>> > prolog                NONE
>> > epilog                NONE
>> > shell_start_mode      posix_compliant
>> > starter_method        NONE
>> > suspend_method        NONE
>> > resume_method         NONE
>> > terminate_method      NONE
>> > notify                00:00:60
>> > owner_list            NONE
>> > user_lists            NONE
>> > xuser_lists           NONE
>> > subordinate_list      NONE
>> > complex_values        NONE
>> > projects              NONE
>> > xprojects             NONE
>> > calendar              NONE
>> > initial_state         default
>> > s_rt                  INFINITY
>> > h_rt                  INFINITY
>> > s_cpu                 INFINITY
>> > h_cpu                 INFINITY
>> > s_fsize               INFINITY
>> > h_fsize               INFINITY
>> > s_data                INFINITY
>> > h_data                INFINITY
>> > s_stack               INFINITY
>> > h_stack               INFINITY
>> > s_core                INFINITY
>> > h_core                INFINITY
>> > s_rss                 INFINITY
>> > h_rss                 INFINITY
>> > s_vmem                19G
>> > h_vmem                19G
>> >
>> > [root at chead ~]# qconf -shgrp @basichosts
>> > group_name @basichosts
>> > hostlist compute-1-0.local compute-1-2.local compute-1-3.local \
>> >          compute-1-5.local compute-1-6.local compute-1-7.local \
>> >          compute-1-8.local compute-1-9.local compute-1-10.local \
>> >          compute-1-11.local compute-1-12.local compute-1-13.local \
>> >          compute-1-14.local compute-1-15.local compute-2-0.local \
>> >          compute-2-2.local compute-2-5.local compute-2-7.local \
>> >          compute-2-8.local compute-2-9.local compute-2-11.local \
>> >          compute-2-12.local compute-2-13.local compute-2-15.local \
>> >          compute-2-6.local
>> >
>> > Thanks
>> >
>> > -M
>> >
>> > _______________________________________________
>> > users mailing list
>> > users at gridengine.org
>> > https://gridengine.org/mailman/listinfo/users
>> >
>>
>>
>>
>> --
>> Best,
>>
>> Feng
>
>



-- 
Best,

Feng



More information about the users mailing list