[gridengine users] Job in error states

MacMullan IV, Hugh hughmac at wharton.upenn.edu
Sat Mar 7 21:57:22 UTC 2020


Or if it’s an NFS share, perhaps it’s become unmounted on one or more exec nodes.

-Hugh

> On Mar 7, 2020, at 10:55, Reuti <reuti at staff.uni-marburg.de> wrote:
> 
> Hi,
> 
> is it alwys failing on one and the same node? Or are several nodes affected? One guess could be that the file system is full.
> 
> -- Reuti
> 
> 
>> Am 05.03.2020 um 18:46 schrieb Jerome <jerome at ibt.unam.mx>:
>> 
>> Dear all
>> 
>> I'm facing a strange error in SGE. One job is declared as in error, as i
>> show in the following:
>> 
>> 
>> ==============================================================
>> job_number:                 1311910
>> exec_file:                  job_scripts/1311910
>> submission_time:            Thu Mar  5 08:06:16 2020
>> owner:                      XXXXXXXXXXXXX
>> 
>> ../..
>> 
>> error reason          1:      03/05/2020 11:11:56 [6021:55928]:
>> execvlp(/opt/gridengine/default/spool/compute-0-0/job_scripts/1311910,
>> "/opt/gridengine/default/spool/compute-0-0/job_scripts/1311910") failed:
>> No such file or directory
>> 
>> 
>> It's seems to be a problem during the copy of the script file on the
>> node.. But, when i clear it, with qmod -cj, the job  come back in error
>> state?
>> 
>> How could explain me what could explain this error?
>> 
>> Thanks!
> 
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users



More information about the users mailing list