[gridengine users] New Execution Host: load_avg = -NA-

Reuti reuti at staff.uni-marburg.de
Tue Nov 13 13:09:33 UTC 2012


Am 13.11.2012 um 13:26 schrieb RATH Jochen (AREVA):

> I have installed a new execution host to my existing OGE pool. Unfortunately I can't start jobs, because the load average won't be submitted to the qmaster host:
> [root@ master ge2011.11]# qstat -F la
> queuename                      qtype resv/used/tot. load_avg arch          states
> ---------------------------------------------------------------------------------
> all.q at calcuserver03.edom.ad.corp BIP   0/0/32         -NA-     -NA-          a
> ---------------------------------------------------------------------------------
> all.q at calcuserver02.edom.ad.corp BIP   0/2/12         10.15    linux-x64
>        hl:load_avg=10.150000
> ---------------------------------------------------------------------------------
> all.q at calcuserver01.edom.ad.corp BIP   0/0/12         0.00     linux-x64
>        hl:load_avg=0.000000
> My grid consist of one master and now three execution nodes. All is installed on a nfs-directory /data_storage, which is stored on the master. The message of the calcuserver03 is:
> [root@ master calcuserver03]# cat messages
> 11/13/2012 13:04:21|  main| calcuserver03|W|local configuration localhost.localdomain not defined - using global configuration
> 11/13/2012 13:04:21|  main| calcuserver03|I|starting up OGS/GE 2011.11 (linux-x64)

This message is harmless. It looks like the exechost can contact the qmaster (to request the configuration), fine. But is the execd still running? Maybe it crashed during startup - any file "execd..." in /tmp? I suppose, the `qhost` output shows a similar information.

> On the master and calcuserver01 runs RHEL 5.8 and on the calcuserver02 and calcuserver03 runs RHEL 6.3. At every server is the iptables stopped and they are all inserted in /etc/hosts.allow.

This is only necessary for applications using the tcp-wrapper and if certain/all services are denied in /etc/hosts.deny by default.

-- Reuti

> Why can't the qmaster get information of the load_avg of the new server? Which information do you need further?
> Regards
>      Jochen
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users

More information about the users mailing list