[gridengine users] qmake retry after connection error

Ido Tamir tamir at imp.ac.at
Tue Jun 14 13:59:45 UTC 2011

we use qmake to parallelize the illumina/solexa pipeline. Its a make based system that
operates on many files to generate some output.

However, often under load we get errors like:

error: commlib error: got select error (Connection reset by peer)
error: executing task of job 7980306 failed: failed sending task to XXX at XXX.xxx: can't find connection

Then we have to restart the pipeline.

I tried the make options -k (keep going) and -i (ignore), and it keeps working, but the result is broken.
-r is not available for qmake.

Is there a possibility to retry for a certain amount of tries if this error comes up - and only this
error? Sometimes there are missing files etc... then it should fail. 
But this is simply a node not answering in a specified amount of time. 
Is there a possibility to extend the timeout?

Thank you very much for your answers,

More information about the users mailing list