[gridengine users] execution node installation error

Reuti reuti at staff.uni-marburg.de
Thu Oct 15 09:45:00 UTC 2015


> Am 15.10.2015 um 01:16 schrieb Hatem Elshazly <hmelshazly at gmail.com>:
> Hi there,
> I'm having a problem getting an execution host to work. The master node seems it can't sense the execution node, when I submit a job it stalls in the queue.

Is it in state "qw" or "t"?

$ qalter -w v <job_id>

will check whether the job could be started in an empty cluster in the current configuration.

The home directory is shared in the cluster, so that the user's home directory can be accessed?

> Both daemons are running on master and executing node, I added the execution node to the queue and made sure the ports are open and can ssh without password from/to both nodes

It's not necessary to have passphraseless SSH in the cluster. Even parallel jobs can run without this setting. In fact, I allow SSH access to nodes only for admin staff.

> , sge_root and sge_cell are open to read and write. The strange thing is when I change the ncpu of the execution node it gets reflected when I use qhost command on master node. 

You mean "num_proc"? This should be seen as a read only value and it's normally not necessary to adjust it. The slot count in the queues is independent from this setting.

-- Reuti

> This is the output of qhost command: (Arch and mem is NA although I set them in the node's values)
> -------------------------------------------------------------------------------
> global                  -               -     -       -       -       -       -
> node001               -               1     -       -       -       -       -
> master                 linux-x64       1  0.01    3.7G  157.8M     0.0     0.0
> Any suggestions on what might be wrong is really appreciated.
> Thanks.
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users

More information about the users mailing list