[gridengine users] large memory hosts reports crazy memory value when h_vmem is set to 100G or above

Reuti reuti at Staff.Uni-Marburg.DE
Wed Sep 24 08:27:59 UTC 2014


Hi,

Am 24.09.2014 um 09:00 schrieb Peter van Heusden:

> On 22/09/2014 16:54, Reuti wrote:
>> Am 22.09.2014 um 16:24 schrieb Peter van Heusden:
>> 
>>> On 22/09/2014 15:50, Reuti wrote:
>>>> Hi,
>>>> 
>>>> Am 22.09.2014 um 15:06 schrieb Peter van Heusden:
>>>> 
>>>>> I'm running SGE 6.2u5 on Ubuntu 12.04 (64 bit). One of my compute nodes
>>>>> has 512 GB of RAM, but when I specify this (with e.g. h_vmem=500G in the
>>>>> complex_values setting for the exec host) and then submit a job that
>>>>> requires a lot of RAM (e.g. -l h_vmem=100G), I get this response:
>>>>> 
>>>>> (-l h_vmem=100G) cannot run at host "bigmemhost.example.com" because it
>>>>> offers only hc:h_vmem=77309411328.000000
>>>> It's 72G expressed as bytes - is this the remaining memory in `qhost -F h_vmem`? Hence the value is correct, just oddly notated?
>>> Nope, this is what I see:
>>> 
>>> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO 
>>> SWAPUS
>>> -------------------------------------------------------------------------------
>>> bigmemhost.example.com      lx26-amd64     16  7.01  503.9G   52.9G   
>>> 5.2G   12.9M
>> Is "h_vmem" defined as a consumable too?
> Yes, here is the relevant output from the complex configuration:
> 
> h_vmem              h_vmem     MEMORY      <=    YES         YES       
> 2G       0
> 
>> 
>> The above is the output of `qhost -F h_vmem`?
> Yes that is the output of qhost -F h_vmem.

This looks like an initial value for "h_vmem" attached to each exechost's "complex_values" definition is missing.

$ qconf -se node001
...
complex_values        h_vmem=256G


> Are you running SGE on a machine with larger than 100GB RAM available?

Yes:

$ qhost -F h_vmem
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
node001                 lx24-amd64     64  0.00  252.4G    3.4G    8.0G     0.0
    Host Resource(s):      hc:h_vmem=256.000G

-- Reuti


> 
>> -- Reuti
>> 
>> 
>>>> --  Reuti
>>>> 
>>>> 
>>>>> If I set the h_vmem to 99G or below I get a meaningful message, e.g.
>>>>> 
>>>>> (-l h_vmem=100G) cannot run at host "smallmemhost.example.com" because
>>>>> it offers only hc:h_vmem=92.000G
>>>>> 
>>>>> This definitely seems to be a bug - is there any way around this?
>>>>> 
>>>>> Thanks!
>>>>> Peter
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users at gridengine.org
>>>>> https://gridengine.org/mailman/listinfo/users
> 





More information about the users mailing list