[gridengine users] SGE crashes immediately after re-start

Reuti reuti at staff.uni-marburg.de
Sun Mar 6 18:07:13 UTC 2016


Am 04.03.2016 um 16:40 schrieb Simon Matthews:

> I am getting this error message:
> 03/04/2016 07:30:14|listen|sgemaster|E|commlib error: local host name
> error (remote rdata host name "turquoise" is not equal to local
> resolved host name "h2.sj.bps")
> 03/04/2016 07:30:23|worker|sgemaster|E|cqueue_list_locate_qinstance("(null)@(null)"):
> cqueue == NULL("(null)", "(null)", 1, 0
> 03/04/2016 07:30:23|worker|sgemaster|E|writing job finish information:
> can't locate queue "(null)@(null)"
> 03/04/2016 07:30:23|worker|sgemaster|W|job 9179498.1 failed on host
> <unknown host> before writing exit_status because: shepherd exited
> with exit status 19: before writing exit_status
> 03/04/2016 07:30:23|worker|sgemaster|C|!!!!!!!!!! got NULL element for
> QU_rerun !!!!!!!!!!
> I have seen references to this condition being fixed by deleting the
> job, but how do I do this? We use BDB spooling. This grid is running
> SGE 6.2U5.

Is the job still running? It looks like it finished already. Nevertheless: did you try a `qdel -f <job_id>`?

-- Reuti

More information about the users mailing list