[gridengine users] OGE Spooling Directory

Joseph Farran jfarran at uci.edu
Tue Jun 5 16:55:59 UTC 2012


Thanks Reuti.

A note for others:   The Qmaster directory is automatically created by OGE, but the compute node directory needs to exist as Reuti says *but* the directory also needs to be owned by OGE as well.    Simply creating the dir will not work.

In my case /data/hpc/oge where I installed OGE is owned by ogeadmin, so "/var/spool/oge" on the compute node needs to exist *and* owned by ogeadmin.

Joseph


On 06/05/2012 08:53 AM, Reuti wrote:
> Am 05.06.2012 um 17:47 schrieb Joseph Farran:
>
>> My OGE software resides on a shared NFS directory /data/hpc/oge.
>>
>> When I run the ./start_gui_installer script set OGE up with:
>>
>>     Qmaster Spool:  /var/spool/oge/default/spool/qmaster
>>     Global execd:    /var/spool/oge/default/spool
> There is no need to have "spool" in the pathname twice.
>
>     Qmaster Spool:  /var/spool/oge/qmaster
>     Global execd:    /var/spool/oge
>
> should do. These directories need to exist I think. The node specigic one will be created by OGE when the execd starts up.
>
>
>>     Spooling: classic
>>
>> The head node installs correctly, but compute nodes installation fails.   The error for the compute nodes show:
>>
>>    AILED: Task failed.
> Is there anything in /tmp from the execd? It's the place where some diagnostic messages will created in case it can't startup.
>
> -- Reuti
>
>
>>    OUTPUT:
>>    Your $SGE_ROOT directory: /data/hpc/oge
>>    Using cell:>default<
>>    Creating local configuration for host>compute-1-1.local<
>>    ogeadmin at compute-1-1.local added "compute-1-1.local" to configuration list
>>    Local configuration for host>compute-1-1.local<  created.
>>    Adding submit host>compute-1-1<
>>    compute-1-1.local added to submit host list
>>    cp /data/hpc/oge/default/common/sgeexecd /etc/init.d/sgeexecd.HPC
>>    /usr/lib/lsb/install_initd /etc/init.d/sgeexecd.HPC
>>        starting sge_execd
>>    root at compute-1-1.local modified "@allhosts" in host group list
>>    root at compute-1-1.local modified "all.q" in cluster queue list
>>    got select error: Connection refused
>>    got select error: closing "compute-1-1.local/execd/1"
>>    Execd on host compute-1-1.local is not started!
>>
>>    ERROR:
>>    Warning: untrusted X11 forwarding setup failed: xauth key data not generated
>>    Warning: No xauth data; using fake authentication data for X11 forwarding.
>>    TERM environment variable not set.
>>
>>
>> If I setup OGE with
>>
>>     Qmaster Spool:  /var/spool/oge/default/spool/qmaster
>>     Global execd:    /data/hpc/oge/default/spool
>>     Spooling: classic
>>
>> Using the NFS share directory for "Global execd", then everything works just fine - compute nodes are setup correctly.
>>
>> What am I doing wrong?
>>
>> Joseph
>>
>>
>> On 06/04/2012 02:51 PM, Reuti wrote:
>>> Hi,
>>>
>>> Am 04.06.2012 um 22:59 schrieb Joseph Farran:
>>>
>>>> When installing OGE with respect to the Spooling Configuration, one can select:
>>>>
>>>>     Qmaster spool directory
>>>>     Global execd spool directory
>>>>
>>>> I installed OGE from the head node on a shared NFS directory ( /data/oge ) and like to make the spooling to be on the head node /var file system while leaving oge executables in the NFS share directory.
>>>>
>>>> Would the options be to change "Qmaster spool directory" to something like "/var/oge"
>>> Yes, or /var/spool/oge.
>>>
>>>
>>>> and leave the "Global execd spool directory" as is which is the shared NFS directory?
>>> Well, this could also be /var/spool/oge, then it would be local on each node.
>>>
>>> http://arc.liv.ac.uk/SGE/howto/nfsreduce.html
>>>
>>> -- Reuti
>>>
>
>



More information about the users mailing list