[gridengine users] OGE Spooling Directory
Reuti
reuti at staff.uni-marburg.de
Tue Jun 5 15:53:31 UTC 2012
Am 05.06.2012 um 17:47 schrieb Joseph Farran:
> My OGE software resides on a shared NFS directory /data/hpc/oge.
>
> When I run the ./start_gui_installer script set OGE up with:
>
> Qmaster Spool: /var/spool/oge/default/spool/qmaster
> Global execd: /var/spool/oge/default/spool
There is no need to have "spool" in the pathname twice.
Qmaster Spool: /var/spool/oge/qmaster
Global execd: /var/spool/oge
should do. These directories need to exist I think. The node specigic one will be created by OGE when the execd starts up.
> Spooling: classic
>
> The head node installs correctly, but compute nodes installation fails. The error for the compute nodes show:
>
> AILED: Task failed.
Is there anything in /tmp from the execd? It's the place where some diagnostic messages will created in case it can't startup.
-- Reuti
> OUTPUT:
> Your $SGE_ROOT directory: /data/hpc/oge
> Using cell: >default<
> Creating local configuration for host >compute-1-1.local<
> ogeadmin at compute-1-1.local added "compute-1-1.local" to configuration list
> Local configuration for host >compute-1-1.local< created.
> Adding submit host >compute-1-1<
> compute-1-1.local added to submit host list
> cp /data/hpc/oge/default/common/sgeexecd /etc/init.d/sgeexecd.HPC
> /usr/lib/lsb/install_initd /etc/init.d/sgeexecd.HPC
> starting sge_execd
> root at compute-1-1.local modified "@allhosts" in host group list
> root at compute-1-1.local modified "all.q" in cluster queue list
> got select error: Connection refused
> got select error: closing "compute-1-1.local/execd/1"
> Execd on host compute-1-1.local is not started!
>
> ERROR:
> Warning: untrusted X11 forwarding setup failed: xauth key data not generated
> Warning: No xauth data; using fake authentication data for X11 forwarding.
> TERM environment variable not set.
>
>
> If I setup OGE with
>
> Qmaster Spool: /var/spool/oge/default/spool/qmaster
> Global execd: /data/hpc/oge/default/spool
> Spooling: classic
>
> Using the NFS share directory for "Global execd", then everything works just fine - compute nodes are setup correctly.
>
> What am I doing wrong?
>
> Joseph
>
>
> On 06/04/2012 02:51 PM, Reuti wrote:
>> Hi,
>>
>> Am 04.06.2012 um 22:59 schrieb Joseph Farran:
>>
>>> When installing OGE with respect to the Spooling Configuration, one can select:
>>>
>>> Qmaster spool directory
>>> Global execd spool directory
>>>
>>> I installed OGE from the head node on a shared NFS directory ( /data/oge ) and like to make the spooling to be on the head node /var file system while leaving oge executables in the NFS share directory.
>>>
>>> Would the options be to change "Qmaster spool directory" to something like "/var/oge"
>> Yes, or /var/spool/oge.
>>
>>
>>> and leave the "Global execd spool directory" as is which is the shared NFS directory?
>> Well, this could also be /var/spool/oge, then it would be local on each node.
>>
>> http://arc.liv.ac.uk/SGE/howto/nfsreduce.html
>>
>> -- Reuti
>>
More information about the users
mailing list