[gridengine users] [SGE-discuss] spool, no information, loss of jobs
d.love at liverpool.ac.uk
Mon Jun 20 21:16:27 UTC 2011
baf035 <baf035 at gmail.com> writes:
> we are using SoGE rel. 3910 for tests.
> Submited jobs are correcty dispatched but no informations are stored in a
> spool direcrory <SPOOL_DIR>/qmaster/jobs.
> In a qmaster messages file are inforamations about missing file/folder at
> the time of ending of job:
> 6/16/2011 10:06:30|schedu|sged2|E|can't find parallel task 50993.1 task
> past_usage for update in function pe_task_update_master_list_usage
> 06/16/2011 10:06:30|schedu|sged2|E|callback function for event "3941466.
> EVENT JOB 50993.1 task past_usage USAGE" failed
> 06/16/2011 10:07:10|worker|sged2|E|unlink(jobs/00/0005/0993/common) failed:
> No such file or directory
> 06/16/2011 10:07:10|worker|sged2|E|can not remove file job spool file:
> 06/16/2011 10:07:10|worker|sged2|E|can not remove file job spool directory:
I'm guessing thouse messages are from qmaster startup/shutdown, in which
case I can reproduce it simply with serial jobs and code at the head of
my repo, but not with the prerelease code, so maybe try that tarball. I don't
know when I'll be able to debug it
Excuse the typping -- I have a broken wrist
More information about the users