[gridengine users] Jobs on qw state and exec node on au state

Radhouane Aniba aradwen at gmail.com
Mon May 30 17:36:39 UTC 2016


Hi Bill

Yes I am sure

This is what I have when I login to one of the nodes and do

ubuntu at compute010:~$ ps -ef | grep sge_
sgeadmin  1254     1  0 May28 ?        00:00:39
/usr/lib/gridengine/sge_qmaster
sgeadmin  1446     1  0 May28 ?        00:00:22
/usr/lib/gridengine/sge_execd
ubuntu    2552  2527  0 17:36 pts/0    00:00:00 grep --color=auto sge_


On Mon, May 30, 2016 at 10:33 AM, Bill Bryce <bbryce at univa.com> wrote:

> Hi Rad,
>
> Are you sure that the execution daemons are running on your compute
> nodes?  Can you login to one of the nodes say ‘compute001’ and do a ps
> looking for the execd?  When an execd is functioning normally it provides
> the load and memory, etc… none of your nodes are showing that.
>
> Regards,
>
> Bill.
>
> On May 30, 2016, at 1:20 PM, Radhouane Aniba <aradwen at gmail.com> wrote:
>
> Hello all,
>
> I am trying to submit a simple "hello world" to test a gridengine (I used
> it before with no problems)
>
> The problem is that my job is waiting in the queue forever
>
> The qhost command shows a wired state of the compute nodes
>
> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
> -------------------------------------------------------------------------------
> global                  -               -     -       -       -       -       -
> compute001              lx26-amd64      4     -   31.4G       -     0.0       -
> compute002              lx26-amd64      4     -   31.4G       -     0.0       -
> compute003              lx26-amd64      4     -   31.4G       -     0.0       -
> compute004              lx26-amd64      4     -   31.4G       -     0.0       -
> compute005              lx26-amd64      4     -   31.4G       -     0.0       -
> compute006              lx26-amd64      4     -   31.4G       -     0.0       -
> compute007              lx26-amd64      4     -   31.4G       -     0.0       -
> compute008              lx26-amd64      4     -   31.4G       -     0.0       -
> compute009              lx26-amd64      4     -   31.4G       -     0.0       -
> compute010              lx26-amd64      4     -   31.4G       -     0.0       -
> compute011              lx26-amd64      4     -   31.4G       -     0.0
>
> In normal times even when the compute nodes are not used I used to have
> some information on the load and memuse columns
>
> I am not an SGE persons but I am familiar with all the commands, any help
> would be much appreciated
>
> the qstat -f command shows all my nodes in au state. I've been reading a
> lot about it and I understood its an alarm state (overloaded ?)
>
> the only heavy activity I had on the head node was a script downloading
> 19T of data, could the headnode be the problem and not the compute nodes ?
> sge_execd is working on all the compute/exec nodes :/
>
> --
> *Rad*
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users
>
>
> William Bryce | VP Products
> Univa Corporation, Toronto
> E: bbryce at univa.com | D: 647-9742841 | Toll-Free (800) 370-5320
> W: Univa.com | FB: facebook.com/univa.corporation | T:
> twitter.com/Grid_Engine
>
>


-- 
*Radhouane Aniba*
*Bioinformatics Scientist*
*BC Cancer Agency, Vancouver, Canada*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20160530/747d6b64/attachment-0001.html>


More information about the users mailing list