[gridengine users] Cannot request resource if it is a load value of memory type: SGE reports it as unknown resource

Ilya M 4ilya.m+grid at gmail.com
Fri Jan 23 19:30:53 UTC 2015


Because I am testing with qsub -w v, the jobs is not accepted for 
scheduling, job id is not generated, and qstat -j will not work. The 
output of qsub is as I showed in the original email:

Job 2210897 (mem_free=100G) cannot run in queue "gpu.q at gpu001" because 
job requests unknown resource (mem_free)
Job 2210897 (mem_free=100G) cannot run in queue "gpu.q at gpu002" because 
job requests unknown resource (mem_free)
Job 2210897 (mem_free=100G) cannot run in queue "gpu.q at gpu003" because 
job requests unknown resource (mem_free)
Job 2210897 (mem_free=100G) cannot run in queue "gpu.q at gpu004" because 
job requests unknown resource (mem_free)
Job 2210897 (mem_free=100G) cannot run in queue "gpu.q at gpu005" because 
job requests unknown resource (mem_free)
Job 2210897 (mem_free=100G) cannot run in queue "gpu.q at gpu006" because 
job requests unknown resource (mem_free)
...

Ilya.


-------- Original Message --------
Subject: Re: [gridengine users] Cannot request resource if it is a load 
value of memory type: SGE reports it as unknown resource
From: Feng Zhang <prod.feng at gmail.com>
To: Ilya M <4ilya.m+grid at gmail.com>
Date: 1/23/15, 9:27 AM
> Llya,
>
> Can you please run:
>
> qstat -j <jobid>
>
> and past the output here? It may be useful for checking the problem
>
> On Fri, Jan 23, 2015 at 12:08 PM, Ilya M <4ilya.m+grid at gmail.com> wrote:
>> Removed the quota limits. To no avail: same problems.
>>
>>
>> -------- Original Message --------
>> Subject: Re: [gridengine users] Cannot request resource if it is a load
>> value of memory type: SGE reports it as unknown resource
>> From: Reuti <reuti at staff.uni-marburg.de>
>> To: Ilya M <4ilya.m+grid at gmail.com>
>> Date: 1/23/15, 2:33 AM
>>> Can you remove them temporarily? I saw cases where suddenly the "unknown
>>> resource" popped up - and also suddenly vanished again, but it was somehow
>>> connected to RQS was my conclusion.
>>>
>>> -- Reuti
>>>
>>>
>>>> Am 23.01.2015 um 00:16 schrieb Ilya M <4ilya.m+grid at gmail.com>:
>>>>
>>>> There are two RQS, one is disabled:
>>>>
>>>> {
>>>>     name         limit_for_interns
>>>>     description  "limit to max 5 GPU jobs per intern."
>>>>     enabled      TRUE
>>>>     limit        users {int1,int2} hosts @gpu to slots=5
>>>> }
>>>> {
>>>>     name         limit_slots
>>>>     description  NONE
>>>>     enabled      FALSE
>>>>     limit        hosts {@gpu} to slots=2
>>>> }
>>>>
>>>>
>>>> -------- Original Message --------
>>>> Subject: Re: [gridengine users] Cannot request resource if it is a load
>>>> value of memory type: SGE reports it as unknown resource
>>>> From: Reuti <reuti at staff.uni-marburg.de>
>>>> To: Ilya <4ilya.m+grid at gmail.com>
>>>> Date: 1/21/15, 16:12
>>>>> Hi,
>>>>>
>>>>> Am 22.01.2015 um 00:52 schrieb Ilya:
>>>>>
>>>>>> Something happened to the SGE (6.2u5) that had been running fine for
>>>>>> many months, and users can no longer put resource requests for load values
>>>>>> if they are of memory type, e.g.
>>>>>>
>>>>>> qsub -l mem_free=5G -w v .... produces the following output:
>>>>>>
>>>>>> cannot run in queue "gpu.q at gpu038" because job requests unknown
>>>>>> resource (mem_free)
>>>>>>
>>>>>> The resource is available, though, when querying for it:
>>>>>> qhost -F mem_free -h gpu038
>>>>>> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE SWAPTO
>>>>>> SWAPUS
>>>>>>
>>>>>> -------------------------------------------------------------------------------
>>>>>> global                  -               -     -       - -       -
>>>>>> -
>>>>>> gpu038                         lx24-amd64     16  2.11  126.1G 15.7G
>>>>>> 4.0G     0.0
>>>>>>      Host Resource(s):      hl:mem_free=110.416G
>>>>>>
>>>>>>
>>>>>> This was first reported by a user when he tried to request custom "hl"
>>>>>> resource. However, it now appears that all "hl" resources of type "memory"
>>>>>> show this behavior. Integer "hl" are OK.
>>>>> Do you have any RQS in place?
>>>>>
>>>>> -- Reuti
>>>>>
>>>>>
>>>>>> I bounced qmaster between master and shadow-master a couple of times,
>>>>>> but it did not resolve the problem.
>>>>>>
>>>>>> Additionally, when I added MONITOR=1 to scheduler's configuration, the
>>>>>> file $SGE_ROOT/$SGE_CELL/common/schedule contains only colons:
>>>>>> ::::::::
>>>>>> ::::::::
>>>>>> ::::::::
>>>>>>
>>>>>> Any ideas?
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users at gridengine.org
>>>>>> https://gridengine.org/mailman/listinfo/users
>>>> _______________________________________________
>>>> users mailing list
>>>> users at gridengine.org
>>>> https://gridengine.org/mailman/listinfo/users
>>
>> _______________________________________________
>> users mailing list
>> users at gridengine.org
>> https://gridengine.org/mailman/listinfo/users
>
>




More information about the users mailing list