[gridengine users] Possible Causes of: critical error: unrecoverable error - contact systems manager
beckerje at mail.nih.gov
Thu Oct 29 19:33:28 UTC 2015
What was the name of the ENV variable? How was it being used by qsub
and/or the job script?
On Thu, Oct 29, 2015 at 03:02:22PM -0400, Wagner, Justin wrote:
>For anybody who is interested I found the root cause of this crash of qsub.
>The root cause is that we had an environment variable whose key was blank that was an artifact of another bug, and this environment variable key causes qsub to crash every single time.
>Hopefully somebody is familiar enough with the qsub code to look at why that might cause a crash. If not, I can cook up a simple script to show the problem.
>From: users-bounces at gridengine.org [mailto:users-bounces at gridengine.org] On Behalf Of Wagner, Justin
>Sent: Tuesday, September 22, 2015 10:02 AM
>To: users at gridengine.org
>Subject: [gridengine users] Possible Causes of: critical error: unrecoverable error - contact systems manager
>I am running SoGE 8.1.0 and recently I had a problem when submitting a job to the grid via qsub, and qsub returned the error "critical error: unrecoverable error - contact systems manager"
>I am trying to narrow down the root cause of this issue. I am able to send the same exact command, from the same exact user, on the same exact submit host, and get the command to work. However, I am using a script that is getting executed by Jenkins to launch the job, and I am also able to reliably reproduce the error when I use the "rebuild" plugin to rebuild the same build. I am suspecting that some environment variable is different between these two cases, and is causing this critical error, however I haven't been able to identify any differences there as of yet.
>Can somebody point me to the source that is throwing this error, or possibly give me a list of what the possible causes are for this error?
>users mailing list
>users at gridengine.org
Jesse Becker (Contractor)
More information about the users