[gridengine users] Re : Question about load average and slots and non-SGE-managed tasks...

Farid Chabane farid.chabane at ymail.com
Tue May 7 17:51:50 UTC 2013


Hi Stephen,

Yes, SGE take into account the current load of nodes even if the load was caused with a non-SGE-job. 
I've made a test on my cluster of three machines (one master and 2 nodes). I stress node001 without passing through SGE...
[root at node001 ~]# uptime
 19:51:19 up 1 day,  7:41,  1 user,  load average: 12.08, 13.68, 9.81

 [root at fadmin ~]# qhost 
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
fadmin                  lx24-amd64      4  1.10    7.8G  420.6M    4.0G     0.0
node001                 lx24-amd64      4  9.57    7.8G  155.1M   24.0G     0.0
node002                 lx24-amd64      4  0.03    7.8G  150.8M   24.0G     0.0

Now, all jobs are scheduled to the node002, due to the load_thresholds value which is exceeded (default value is 1.75).

Best regards,
Farid.

--- En date de : Mar 7.5.13, Stephen Spencer <spencer at cs.washington.edu> a écrit :

De: Stephen Spencer <spencer at cs.washington.edu>
Objet: [gridengine users] Question about load average and slots and non-SGE-managed tasks...
À: users at gridengine.org
Date: Mardi 7 mai 2013, 17h09

Good morning.
I'm administering a cluster of machines with SGE (6.2u5, from the RHEL distro) and have a question concerning the scheduler's behavior. (I'm rather new to SGE.)

On this cluster, users can and do log in (via 'ssh') and run computational tasks on cluster nodes, which ties up resources but not an SGE 'slot' because the tasks aren't submitted through SGE.

My question is this: does SGE take into consideration the current load average on a node when assigning tasks? For example, given two nodes with equivalent numbers of slots, and one node has a load average of 10 and the other 0, will SGE send a waiting job to the node with less load?


I see "load_thresholds   np_load_avg=1.75" in the output of "qconf -sq all.q" and am guessing that if the value of "np_load_avg" on a given host, as SGE calculates it, is greater than 1.75, tasks will be assigned elsewhere first, but that's only a guess. Confirmation, or clarification of what this means, would be wonderful.


Thank you.
Best, -- 
Stephen Spencer
spencer at cs.washington.edu


-----La pièce jointe associée suit-----

_______________________________________________
users mailing list
users at gridengine.org
https://gridengine.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20130507/338c926e/attachment.html>


More information about the users mailing list