[gridengine users] Cannot request resource if it is a load value of memory type: SGE reports it as unknown resource

Reuti reuti at staff.uni-marburg.de
Fri Jan 23 10:33:09 UTC 2015


Can you remove them temporarily? I saw cases where suddenly the "unknown resource" popped up - and also suddenly vanished again, but it was somehow connected to RQS was my conclusion.

-- Reuti


> Am 23.01.2015 um 00:16 schrieb Ilya M <4ilya.m+grid at gmail.com>:
> 
> There are two RQS, one is disabled:
> 
> {
>   name         limit_for_interns
>   description  "limit to max 5 GPU jobs per intern."
>   enabled      TRUE
>   limit        users {int1,int2} hosts @gpu to slots=5
> }
> {
>   name         limit_slots
>   description  NONE
>   enabled      FALSE
>   limit        hosts {@gpu} to slots=2
> }
> 
> 
> -------- Original Message --------
> Subject: Re: [gridengine users] Cannot request resource if it is a load value of memory type: SGE reports it as unknown resource
> From: Reuti <reuti at staff.uni-marburg.de>
> To: Ilya <4ilya.m+grid at gmail.com>
> Date: 1/21/15, 16:12
>> Hi,
>> 
>> Am 22.01.2015 um 00:52 schrieb Ilya:
>> 
>>> Something happened to the SGE (6.2u5) that had been running fine for many months, and users can no longer put resource requests for load values if they are of memory type, e.g.
>>> 
>>> qsub -l mem_free=5G -w v .... produces the following output:
>>> 
>>> cannot run in queue "gpu.q at gpu038" because job requests unknown resource (mem_free)
>>> 
>>> The resource is available, though, when querying for it:
>>> qhost -F mem_free -h gpu038
>>> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE SWAPTO  SWAPUS
>>> -------------------------------------------------------------------------------
>>> global                  -               -     -       - -       -       -
>>> gpu038                         lx24-amd64     16  2.11  126.1G 15.7G    4.0G     0.0
>>>    Host Resource(s):      hl:mem_free=110.416G
>>> 
>>> 
>>> This was first reported by a user when he tried to request custom "hl" resource. However, it now appears that all "hl" resources of type "memory" show this behavior. Integer "hl" are OK.
>> Do you have any RQS in place?
>> 
>> -- Reuti
>> 
>> 
>>> I bounced qmaster between master and shadow-master a couple of times, but it did not resolve the problem.
>>> 
>>> Additionally, when I added MONITOR=1 to scheduler's configuration, the file $SGE_ROOT/$SGE_CELL/common/schedule contains only colons:
>>> ::::::::
>>> ::::::::
>>> ::::::::
>>> 
>>> Any ideas?
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users at gridengine.org
>>> https://gridengine.org/mailman/listinfo/users
> 
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users





More information about the users mailing list