[gridengine users] mem_free settings

Gowtham g at mtu.edu
Tue Mar 31 12:56:07 UTC 2015


I think I found the mistake in my submission script. 

  hard resource_list:         mem_free=128.00G

should be

  hard resource_list:         mem_free=2.00G

so that the job with 64 processor requests 128 GB total RAM. Correct?

Best regards,
g

--
Gowtham, PhD
Director of Research Computing, IT
Adj. Asst. Professor, Physics/ECE
Michigan Technological University

(906) 487/3593
http://it.mtu.edu
http://hpc.mtu.edu


On Tue, 31 Mar 2015, Gowtham wrote:

| 
| Hi Reuti,
| 
| It's a 64 processor job, and my hope/plan is that it requests 2 GB per processor for a total of 128 GB. But each compute node only has 64 GB total ram (60 of which is set to requestable/consumable).
| 
| I could be mistaken, but I think the job is looking for 128 GB RAM per node? Please correct me if I am wrong.
| 
| Best regards,
| g
| 
| --
| Gowtham, PhD
| Director of Research Computing, IT
| Adj. Asst. Professor, Physics/ECE
| Michigan Technological University
| 
| (906) 487/3593
| http://it.mtu.edu
| http://hpc.mtu.edu
| 
| 
| On Tue, 31 Mar 2015, Reuti wrote:
| 
| | 
| | > Am 31.03.2015 um 14:17 schrieb Gowtham <g at mtu.edu>:
| | > 
| | > 
| | > Please find it here:
| | > 
| | >  http://sgowtham.com/downloads/qstat_j_74545.txt
| | 
| | Ok, but where is the SGE looking for 256GB? For now each slot will get 128G as requested
| | 
| | -- Reuti
| | 
| | 
| | > 
| | > Best regards,
| | > g
| | > 
| | > --
| | > Gowtham, PhD
| | > Director of Research Computing, IT
| | > Adj. Asst. Professor, Physics/ECE
| | > Michigan Technological University
| | > 
| | > (906) 487/3593
| | > http://it.mtu.edu
| | > http://hpc.mtu.edu
| | > 
| | > 
| | > On Tue, 31 Mar 2015, Reuti wrote:
| | > 
| | > | Hi,
| | > | 
| | > | > Am 31.03.2015 um 13:13 schrieb Gowtham <g at mtu.edu>:
| | > | > 
| | > | > 
| | > | > In one of our clusters that has homogeneous compute nodes (64 GB RAM), I have set mem_free as a requestable and consumable resource. From the mailing list archives, I have done
| | > | > 
| | > | >  for x in `qconf -sel`
| | > | >  do
| | > | >    qconf -mattr exechost complex_values mem_free=60G $x
| | > | >  done
| | > | > 
| | > | > Every job that gets submitted by every user has the following line in the submission script:
| | > | > 
| | > | >  #$ -hard -l mem_free=2G
| | > | > 
| | > | > for single processor jobs, and
| | > | > 
| | > | >  #$ -hard -l mem_free=(2/NPROCS)G
| | > | > 
| | > | > for a parallel job using NPROCS processors.
| | > | > 
| | > | > 
| | > | > All single processor jobs run just fine, and so do many parallel jobs. But some parallel jobs, when the participating processors are spread across multiple compute nodes, keep on waiting.
| | > | > 
| | > | > When inspected with 'qstat -j JOB_ID', I notice that the job is looking for (2 * NPROCS)G of RAM in each compute node. How would I go about resolving this issue? If additional information is necessary from my end, please let me know.
| | > | 
| | > | Can you please post the output of `qstat -j JOB_ID` of such a job.
| | > | 
| | > | -- Reuti
| | > | 
| | > | 
| | > | > 
| | > | > Thank you for your time and help.
| | > | > 
| | > | > Best regards,
| | > | > g
| | > | > 
| | > | > --
| | > | > Gowtham, PhD
| | > | > Director of Research Computing, IT
| | > | > Adj. Asst. Professor, Physics/ECE
| | > | > Michigan Technological University
| | > | > 
| | > | > (906) 487/3593
| | > | > http://it.mtu.edu
| | > | > http://hpc.mtu.edu
| | > | > 
| | > | > _______________________________________________
| | > | > users mailing list
| | > | > users at gridengine.org
| | > | > https://gridengine.org/mailman/listinfo/users
| | > | 
| | > | 
| | 
| | 
| 


More information about the users mailing list