[gridengine users] Strange issue with one node

Jerome jerome at ibt.unam.mx
Mon Oct 24 17:29:49 UTC 2016


Dear Skylar.

I check this too, and all seems normal:

$ qconf -sp orte
pe_name            orte
slots              9999
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $fill_up
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary TRUE


$ qconf -sq all.q
qname                 all.q
hostlist              @allhosts
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make mpi mpich orte thread
rerun                 FALSE
slots                 1,[compute-0-0.local=4],[compute-0-1.local=4]
tmpdir                /tmp
shell                 /bin/csh
prolog                NONE
epilog                NONE
shell_start_mode      unix_behavior
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY


Thank's

Regards


Le 24/10/2016 à 12:20, Skylar Thompson a écrit :
> On Mon, Oct 24, 2016 at 12:15:41PM -0500, Jerome wrote:
>> Dear all
>>
>> I've install for a course a Rocks Cluster of 2 nodes, with SGE. Each
>> node are a 4 cores nodes.
>> I do a shutdown of a node, and so i have ready uniquely 4 cores:
>>
>> $ qstat -f
>> queuename                      qtype resv/used/tot. load_avg arch
>>    states
>> ---------------------------------------------------------------------------------
>> all.q at compute-0-0.local        BIP   0/0/4          0.00     linux-x64
>> ---------------------------------------------------------------------------------
>> all.q at compute-0-1.local        BIP   0/0/4          -NA-     linux-x64
>>    au
>>
>>
>>
>> But i come in a strange issue, that i can't explain yet:
>> My user submit a paralele job with 8 cores.
>> When i check my job state, in "qw" state, i've get back thios message:
>>
>> $ qtsat j 58
>>   ../..
>>
>>   scheduling info:            queue instance "all.q at compute-0-1.local"
>> dropped because it is temporarily not available
>>                              cannot run in PE "orte" because it only
>> offers 7 slots
>>
>> If i power on the second node, the message is ths same:
>>
>> $ qstat -f
>> queuename                      qtype resv/used/tot. load_avg arch
>>    states
>> ---------------------------------------------------------------------------------
>> all.q at compute-0-0.local        BIP   0/0/4          0.00     linux-x64
>> ---------------------------------------------------------------------------------
>> all.q at compute-0-1.local        BIP   0/0/4          0.10     linux-x64
>>
>>
>> $ qstat -j 58
>>
>> ../..
>>
>> parallel environment:  orte range: 8
>> version:                    3
>> scheduling info:            cannot run in PE "orte" because it only
>> offers 7 slots
>>
>>
>> I've search on all of the configuration of SGE. I do too the
>> reinstalation of the 2 nodes. But the same message appears, that
>> uniquely 7 slots free !
>>
>> Someone can't get me some help?
>
> What do "qconf -sp orte" and "qconf -sq all.q" report?
>


-- 
-- Jérôme
- Pourquoi buvez-vous?
- La question m'a déjà été posé monsieur le proviseur.
- Probablement par des gens qui vous aiment bien.
- Probablement. Claire me la posait trois fois par semaine: devait m'adorer
	(Michel Audiard)



More information about the users mailing list