[gridengine users] Job fails to run. misconfiguration? SGE 8.0b
Allan Tran
tran.v.allan at gmail.com
Fri Oct 7 18:34:15 UTC 2011
I am installing GE 8.0b for testing using binary (install_execd and
install_qmaster). Everything seemed going very smoothly. However when I
submit a test job (sleeper.sh), it didn't run.
I guess there is something I missed in configurations (I used most default
options when I installed).
Here is snippet of what in the log files: (master node is qmaster and also
execution)
Qmaster log:
10/07/2011 12:15:26| main|master|I|read job database with 0 entries in 0
seconds
10/07/2011 12:15:26| main|master|E|error opening file
"/usr/local/sge/default/spool/qmaster/./sharetree" for reading: No such file
or directory
10/07/2011 12:15:26| main|master|I|qmaster hard descriptor limit is set to
8192
10/07/2011 12:15:26| main|master|I|qmaster soft descriptor limit is set to
8192
10/07/2011 12:15:26| main|master|I|qmaster will use max. 8172 file
descriptors for communication
10/07/2011 12:15:26| main|master|I|qmaster will accept max. 99 dynamic
event clients
10/07/2011 12:15:26| main|master|I|starting up SGE 8.0.0b (lx-amd64)
10/07/2011 12:22:57|worker|master|W|job 8.1 failed on host master invalid
execution state because: shepherd exited with exit status 127: invalid
execution state
Execution log
10/07/2011 12:22:38| main|master|I|starting up SGE 8.0.0b (lx-amd64)
10/07/2011 12:22:57| main|master|E|shepherd of job 8.1 exited with exit
status = 127
10/07/2011 12:22:57| main|master|E|abnormal termination of shepherd for job
8.1: no "exit_status" file
10/07/2011 12:22:57| main|master|E|can't open file active_jobs/8.1/error:
No such file or directory
10/07/2011 12:22:57| main|master|E|can't open pid file
"active_jobs/8.1/pid" for job 8.1
10/07/2011 12:22:57| main|master|E|can't open usage file
"active_jobs/8.1/usage" for job 8.1: No such file or directory
10/07/2011 12:22:57| main|master|E|shepherd exited with exit status 127:
invalid execution state
Both qmaster and execd are running
root at master:/usr/local/sge/default/spool/qmaster[1113]> ps aguwx | grep sge
sgeadmin 7410 0.0 0.8 653128 34976 ? Sl 12:15 0:00
/usr/local/sge/bin/lx-amd64/sge_qmaster
sgeadmin 7546 0.0 0.0 111580 2532 ? Sl 12:22 0:00
/usr/local/sge/bin/lx-amd64/sge_execd
root 7669 0.0 0.0 61192 764 pts/1 S+ 12:33 0:00 grep sge
Can someone help?
Thank you
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20111007/9dede480/attachment.html>
More information about the users
mailing list