[gridengine users] Jobs on qw state and exec node on au state

Radhouane Aniba aradwen at gmail.com
Mon May 30 17:20:14 UTC 2016


Hello all,

I am trying to submit a simple "hello world" to test a gridengine (I used
it before with no problems)

The problem is that my job is waiting in the queue forever

The qhost command shows a wired state of the compute nodes

HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
compute001              lx26-amd64      4     -   31.4G       -     0.0       -
compute002              lx26-amd64      4     -   31.4G       -     0.0       -
compute003              lx26-amd64      4     -   31.4G       -     0.0       -
compute004              lx26-amd64      4     -   31.4G       -     0.0       -
compute005              lx26-amd64      4     -   31.4G       -     0.0       -
compute006              lx26-amd64      4     -   31.4G       -     0.0       -
compute007              lx26-amd64      4     -   31.4G       -     0.0       -
compute008              lx26-amd64      4     -   31.4G       -     0.0       -
compute009              lx26-amd64      4     -   31.4G       -     0.0       -
compute010              lx26-amd64      4     -   31.4G       -     0.0       -
compute011              lx26-amd64      4     -   31.4G       -     0.0

In normal times even when the compute nodes are not used I used to have
some information on the load and memuse columns

I am not an SGE persons but I am familiar with all the commands, any help
would be much appreciated

the qstat -f command shows all my nodes in au state. I've been reading a
lot about it and I understood its an alarm state (overloaded ?)

the only heavy activity I had on the head node was a script downloading 19T
of data, could the headnode be the problem and not the compute nodes ?
sge_execd is working on all the compute/exec nodes :/

-- 
*Rad*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20160530/6eb2d5eb/attachment.html>


More information about the users mailing list