[gridengine users] Node with negative value of a consumable

Javier Lopez Cacheiro jlopez at cesga.es
Mon Jun 13 13:12:18 UTC 2011


Hi,

We have found a strange situation where GE 6.2u5 has allocated more 
resources in a node than available, leaving a consumable with a value 
lower than 0 (in this case the consumable is num_proc).

This is somehow similar to an issue that was found some time ago in SGE 
6.2 (issue 2091) but in that case it was related to mpi jobs with fillup 
allocation rule, and it was already solved in 6.2u3.

Now this is somehow different because it is not affecting mpi jobs but a 
non-mpi job and it is occurring only in certain circumstances that are 
still not clear.

In this case the situation was that at 06:13:57 the node had already 7 
jobs running, consuming 24 units of num_proc. Num_proc it is configured 
as a consumable with a value of 24. So at that time the value of 
num_proc was 0. But 4 seconds later, at 06:14:01, a new job was started 
in the node that requested 24 num_proc, leaving the node with a value of 
-24 for num_proc.

I don't know if anyone else has come over this same problem with 6.2u5 
and if there is a workaround for it.

[jlopez at svgd ~]$ qhost -q -j -h c5-11
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO
SWAPUS
-------------------------------------------------------------------------------
global - - - - - - -
compute-5-11 x86_64 -24 47.92 31.5G 9.0G 8.0G 0.0
GRID_large BP 0/4/24
6667492 1.92242 STDIN compchem015 r 06/10/2011 06:13:30 MASTER
6667493 1.92241 STDIN compchem015 r 06/10/2011 06:13:41 MASTER
6667494 1.92241 STDIN compchem015 r 06/10/2011 06:13:47 MASTER
6667495 1.92241 STDIN compchem015 r 06/10/2011 06:13:57 MASTER
GRID_small BP 0/0/24
small BPC 0/10/24
6652641 11.27961 p1761-7 csebdmfa r 06/10/2011 06:14:01 MASTER
6655259 10.43999 p577-16 csebdmfa r 06/10/2011 06:12:26 MASTER
6667942 3.93900 AuLJ139 csmyslfs r 06/10/2011 06:12:46 MASTER
SLAVE
SLAVE
SLAVE
SLAVE
SLAVE
SLAVE
SLAVE
SLAVE
g0-mem_small BPC 0/0/24
offline BP 0/0/24


Thanks in advance,
Javier




More information about the users mailing list