[gridengine users] Cannot request resource if it is a load value of memory type: SGE reports it as unknown resource

Feng Zhang prod.feng at gmail.com
Fri Jan 23 17:27:33 UTC 2015


Llya,

Can you please run:

qstat -j <jobid>

and past the output here? It may be useful for checking the problem

On Fri, Jan 23, 2015 at 12:08 PM, Ilya M <4ilya.m+grid at gmail.com> wrote:
> Removed the quota limits. To no avail: same problems.
>
>
> -------- Original Message --------
> Subject: Re: [gridengine users] Cannot request resource if it is a load
> value of memory type: SGE reports it as unknown resource
> From: Reuti <reuti at staff.uni-marburg.de>
> To: Ilya M <4ilya.m+grid at gmail.com>
> Date: 1/23/15, 2:33 AM
>>
>> Can you remove them temporarily? I saw cases where suddenly the "unknown
>> resource" popped up - and also suddenly vanished again, but it was somehow
>> connected to RQS was my conclusion.
>>
>> -- Reuti
>>
>>
>>> Am 23.01.2015 um 00:16 schrieb Ilya M <4ilya.m+grid at gmail.com>:
>>>
>>> There are two RQS, one is disabled:
>>>
>>> {
>>>    name         limit_for_interns
>>>    description  "limit to max 5 GPU jobs per intern."
>>>    enabled      TRUE
>>>    limit        users {int1,int2} hosts @gpu to slots=5
>>> }
>>> {
>>>    name         limit_slots
>>>    description  NONE
>>>    enabled      FALSE
>>>    limit        hosts {@gpu} to slots=2
>>> }
>>>
>>>
>>> -------- Original Message --------
>>> Subject: Re: [gridengine users] Cannot request resource if it is a load
>>> value of memory type: SGE reports it as unknown resource
>>> From: Reuti <reuti at staff.uni-marburg.de>
>>> To: Ilya <4ilya.m+grid at gmail.com>
>>> Date: 1/21/15, 16:12
>>>>
>>>> Hi,
>>>>
>>>> Am 22.01.2015 um 00:52 schrieb Ilya:
>>>>
>>>>> Something happened to the SGE (6.2u5) that had been running fine for
>>>>> many months, and users can no longer put resource requests for load values
>>>>> if they are of memory type, e.g.
>>>>>
>>>>> qsub -l mem_free=5G -w v .... produces the following output:
>>>>>
>>>>> cannot run in queue "gpu.q at gpu038" because job requests unknown
>>>>> resource (mem_free)
>>>>>
>>>>> The resource is available, though, when querying for it:
>>>>> qhost -F mem_free -h gpu038
>>>>> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE SWAPTO
>>>>> SWAPUS
>>>>>
>>>>> -------------------------------------------------------------------------------
>>>>> global                  -               -     -       - -       -
>>>>> -
>>>>> gpu038                         lx24-amd64     16  2.11  126.1G 15.7G
>>>>> 4.0G     0.0
>>>>>     Host Resource(s):      hl:mem_free=110.416G
>>>>>
>>>>>
>>>>> This was first reported by a user when he tried to request custom "hl"
>>>>> resource. However, it now appears that all "hl" resources of type "memory"
>>>>> show this behavior. Integer "hl" are OK.
>>>>
>>>> Do you have any RQS in place?
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>> I bounced qmaster between master and shadow-master a couple of times,
>>>>> but it did not resolve the problem.
>>>>>
>>>>> Additionally, when I added MONITOR=1 to scheduler's configuration, the
>>>>> file $SGE_ROOT/$SGE_CELL/common/schedule contains only colons:
>>>>> ::::::::
>>>>> ::::::::
>>>>> ::::::::
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users at gridengine.org
>>>>> https://gridengine.org/mailman/listinfo/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users at gridengine.org
>>> https://gridengine.org/mailman/listinfo/users
>
>
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users



-- 
Best,

Feng



More information about the users mailing list