[gridengine users] mem_free settings

Gowtham g at mtu.edu
Tue Mar 31 12:53:15 UTC 2015


Hi Reuti,

It's a 64 processor job, and my hope/plan is that it requests 2 GB per processor for a total of 128 GB. But each compute node only has 64 GB total ram (60 of which is set to requestable/consumable).

I could be mistaken, but I think the job is looking for 128 GB RAM per node? Please correct me if I am wrong.

Best regards,
g

--
Gowtham, PhD
Director of Research Computing, IT
Adj. Asst. Professor, Physics/ECE
Michigan Technological University

(906) 487/3593
http://it.mtu.edu
http://hpc.mtu.edu


On Tue, 31 Mar 2015, Reuti wrote:

| 
| > Am 31.03.2015 um 14:17 schrieb Gowtham <g at mtu.edu>:
| > 
| > 
| > Please find it here:
| > 
| >  http://sgowtham.com/downloads/qstat_j_74545.txt
| 
| Ok, but where is the SGE looking for 256GB? For now each slot will get 128G as requested
| 
| -- Reuti
| 
| 
| > 
| > Best regards,
| > g
| > 
| > --
| > Gowtham, PhD
| > Director of Research Computing, IT
| > Adj. Asst. Professor, Physics/ECE
| > Michigan Technological University
| > 
| > (906) 487/3593
| > http://it.mtu.edu
| > http://hpc.mtu.edu
| > 
| > 
| > On Tue, 31 Mar 2015, Reuti wrote:
| > 
| > | Hi,
| > | 
| > | > Am 31.03.2015 um 13:13 schrieb Gowtham <g at mtu.edu>:
| > | > 
| > | > 
| > | > In one of our clusters that has homogeneous compute nodes (64 GB RAM), I have set mem_free as a requestable and consumable resource. From the mailing list archives, I have done
| > | > 
| > | >  for x in `qconf -sel`
| > | >  do
| > | >    qconf -mattr exechost complex_values mem_free=60G $x
| > | >  done
| > | > 
| > | > Every job that gets submitted by every user has the following line in the submission script:
| > | > 
| > | >  #$ -hard -l mem_free=2G
| > | > 
| > | > for single processor jobs, and
| > | > 
| > | >  #$ -hard -l mem_free=(2/NPROCS)G
| > | > 
| > | > for a parallel job using NPROCS processors.
| > | > 
| > | > 
| > | > All single processor jobs run just fine, and so do many parallel jobs. But some parallel jobs, when the participating processors are spread across multiple compute nodes, keep on waiting.
| > | > 
| > | > When inspected with 'qstat -j JOB_ID', I notice that the job is looking for (2 * NPROCS)G of RAM in each compute node. How would I go about resolving this issue? If additional information is necessary from my end, please let me know.
| > | 
| > | Can you please post the output of `qstat -j JOB_ID` of such a job.
| > | 
| > | -- Reuti
| > | 
| > | 
| > | > 
| > | > Thank you for your time and help.
| > | > 
| > | > Best regards,
| > | > g
| > | > 
| > | > --
| > | > Gowtham, PhD
| > | > Director of Research Computing, IT
| > | > Adj. Asst. Professor, Physics/ECE
| > | > Michigan Technological University
| > | > 
| > | > (906) 487/3593
| > | > http://it.mtu.edu
| > | > http://hpc.mtu.edu
| > | > 
| > | > _______________________________________________
| > | > users mailing list
| > | > users at gridengine.org
| > | > https://gridengine.org/mailman/listinfo/users
| > | 
| > | 
| 
| 


More information about the users mailing list