[gridengine users] [SGE-discuss] spool, no information, loss of jobs
reuti at staff.uni-marburg.de
Thu Jun 16 13:52:28 UTC 2011
Am 16.06.2011 um 15:03 schrieb baf035:
> we are using SoGE rel. 3910 for tests.
> Submited jobs are correcty dispatched but no informations are stored in a spool direcrory <SPOOL_DIR>/qmaster/jobs.
You are using classic spooling?
> In a qmaster messages file are inforamations about missing file/folder at the time of ending of job:
> 6/16/2011 10:06:30|schedu|sged2|E|can't find parallel task 50993.1 task past_usage for update in function pe_task_update_master_list_usage
> 06/16/2011 10:06:30|schedu|sged2|E|callback function for event "3941466. EVENT JOB 50993.1 task past_usage USAGE" failed
> 06/16/2011 10:07:10|worker|sged2|E|unlink(jobs/00/0005/0993/common) failed: No such file or directory
> 06/16/2011 10:07:10|worker|sged2|E|can not remove file job spool file: jobs/00/0005/0993/common
The "common" is strange here. What I saw in the past was just a plain file like 0993 containing binary information of the job.
> 06/16/2011 10:07:10|worker|sged2|E|can not remove file job spool directory: jobs/00/0005/0993
> qacct -j 50993 | grep end_time | uniq
> end_time Thu Jun 16 10:05:52 2011
> A migration of the qmasterd leads to a total lost of job informations. No jobs in qstat after the migration.
> We have encountered also a case when files in <SPOOL_DIR>/qmaster/jobs are correctly created but during
> the migration disappeard without a log in the messages file.
And it's in a shared space?
> Please validate this behavior and thanks for a fix.
> SGE-discuss mailing list
> SGE-discuss at liv.ac.uk
More information about the users