[gridengine users] Job exit_status =1 , but failed=0 according to qacct.. what causes that?

William Deegan bill at baddogconsulting.com
Mon Jun 27 20:57:47 UTC 2011


Reuti,

Thanks that's exactly the issue.
I just remembered I had:

set -e
and then ran grep which found a result, returned non-zero, and caused the script to exit.

-Bill

On Jun 27, 2011, at 12:01 PM, Reuti wrote:

> Hi,
> 
> Am 27.06.2011 um 20:47 schrieb William Deegan:
> 
>> Greetings,
>> 
>> I have a short running job, which doesn't exceed memory or runtime limits specified.
>> Here's the qacct output:
>> qname        all.q               
>> hostname     a13.company.com
>> group        contr               
>> owner        user123              
>> project      NONE                
>> department   defaultdepartment   
>> jobname      veri_MM_11.1_82     
>> jobnumber    18323               
>> taskid       undefined
>> account      sge                 
>> priority     0                   
>> qsub_time    Mon Jun 27 11:30:58 2011
>> start_time   Mon Jun 27 11:31:48 2011
>> end_time     Mon Jun 27 11:33:05 2011
>> granted_pe   NONE                
>> slots        1                   
>> failed       0    
>> exit_status  1                   
> 
> it's the result of your job. Something like:
> 
> #!/bin/sh
> exit 1
> 
> will produce it. From SGE's point of view the job ran successfully. Whether there is any application error because of wrong input data or whatever it can't decide.
> 
> NB: There are two special error codes which the user can use to trigger a special behavior in SGE: a job exiting with 99 will be rescheduled and with 100 set to application error.
> 
> -- Reuti
> 
> 
>> ru_wallclock 77           
>> ru_utime     65.078       
>> ru_stime     3.581        
>> ru_maxrss    0                   
>> ru_ixrss     0                   
>> ru_ismrss    0                   
>> ru_idrss     0                   
>> ru_isrss     0                   
>> ru_minflt    896944              
>> ru_majflt    0                   
>> ru_nswap     0                   
>> ru_inblock   0                   
>> ru_oublock   0                   
>> ru_msgsnd    0                   
>> ru_msgrcv    0                   
>> ru_nsignals  0                   
>> ru_nvcsw     14382               
>> ru_nivcsw    1306                
>> cpu          68.660       
>> mem          23.400            
>> io           0.101             
>> iow          0.000             
>> maxvmem      1.338G
>> arid         undefined
>> 
>> 
>> Why does the exit status=1 if the job didn't fail?
>> 
>> -Bill
>> _______________________________________________
>> users mailing list
>> users at gridengine.org
>> https://gridengine.org/mailman/listinfo/users
> 





More information about the users mailing list