[gridengine users] Cannot request resource if it is a load value of memory type: SGE reports it as unknown resource

Ilya 4ilya.m+grid at gmail.com
Wed Jan 21 23:52:55 UTC 2015


Hi All,

Something happened to the SGE (6.2u5) that had been running fine for 
many months, and users can no longer put resource requests for load 
values if they are of memory type, e.g.

qsub -l mem_free=5G -w v .... produces the following output:

cannot run in queue "gpu.q at gpu038" because job requests unknown resource 
(mem_free)

The resource is available, though, when querying for it:
qhost -F mem_free -h gpu038
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE SWAPTO  
SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       - -       -       -
gpu038                         lx24-amd64     16  2.11  126.1G 15.7G    
4.0G     0.0
     Host Resource(s):      hl:mem_free=110.416G


This was first reported by a user when he tried to request custom "hl" 
resource. However, it now appears that all "hl" resources of type 
"memory" show this behavior. Integer "hl" are OK.

I bounced qmaster between master and shadow-master a couple of times, 
but it did not resolve the problem.

Additionally, when I added MONITOR=1 to scheduler's configuration, the 
file $SGE_ROOT/$SGE_CELL/common/schedule contains only colons:
::::::::
::::::::
::::::::

Any ideas?




More information about the users mailing list