[gridengine users] GE2011.11 and ge6.2u5
michael.coffman at avagotech.com
Fri Jun 15 16:01:18 UTC 2012
I am trying to update my sge_execd and sge_shepherd binaries. Based on
recent emails, I figured I could drop the GE2011.11 bits into place and
they would work fine. I am however having issues:
My grid environment is:
Current Version - SGE - 6.2u5
Binary path is /opt/grid/bin/lx24-amd64.
I had to make l link in /opt/grid/bin for linux-x64 to get things to work.
I used the following commands and it did indeed update live and the running
processes seemed happy and all seemed to be working fine:
service sgeexecd softstop
ln -s lx24-amd64 linux-x64
mv sge_shepherd sge_shepherd.old
mv sge_execd sge_execd.old
cp $gbits/sge_shepherd .
cp $gbits/sge_execd .
service sgeexecd start
Since yesterday though I have had a couple of jobs fail and put their queue
into an error state.
Mail from the failing job:
06/14/2012 21:29:37 [20339:8436]: can't open file job_pid: Permission
>From the qmaster messages file:
06/14/2012 21:29:39|worker|gemaster|W|job 3885.1 failed on host
cs428.ftc.avagotech.net general before job because: 06/14/2012 21:29:37
[20339:8436]: can't open file job_pid: Permission denied
I checked a job_pid file on a currently running job on the system that had
the above errors, permission down the entire tree seems fine and here is
the job_id file:
-rw-r--r-- 1 grid grid 6 Jun 14 17:40
Any clues? Is the path perhaps hard coded into sge_shepherd for this
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the users