[gridengine users] mem_free settings

Gowtham g at mtu.edu
Tue Mar 31 11:13:54 UTC 2015


In one of our clusters that has homogeneous compute nodes (64 GB RAM), I have set mem_free as a requestable and consumable resource. From the mailing list archives, I have done

   for x in `qconf -sel`
   do
     qconf -mattr exechost complex_values mem_free=60G $x
   done

Every job that gets submitted by every user has the following line in the submission script:

   #$ -hard -l mem_free=2G

for single processor jobs, and

   #$ -hard -l mem_free=(2/NPROCS)G

for a parallel job using NPROCS processors.


All single processor jobs run just fine, and so do many parallel jobs. But some parallel jobs, when the participating processors are spread across multiple compute nodes, keep on waiting.

When inspected with 'qstat -j JOB_ID', I notice that the job is looking for (2 * NPROCS)G of RAM in each compute node. How would I go about resolving this issue? If additional information is necessary from my end, please let me know.

Thank you for your time and help.

Best regards,
g

--
Gowtham, PhD
Director of Research Computing, IT
Adj. Asst. Professor, Physics/ECE
Michigan Technological University

(906) 487/3593
http://it.mtu.edu
http://hpc.mtu.edu



More information about the users mailing list