[gridengine users] jobs randomly die

hiller hiller at mpia-hd.mpg.de
Tue May 14 13:01:37 UTC 2019


Hi,
nope, there are no oom messages in the journal.
Regards, ulrich


On 5/14/19 12:49 PM, Arnau wrote:
> Hi,
> 
> _maybe_ the OOM killer killed the job ? a look to messages will give you an answer (I've seen this in my cluster).
> 
> HTH,
> Arnau
> 
> El mar., 14 may. 2019 a las 12:37, hiller (<hiller at mpia-hd.mpg.de <mailto:hiller at mpia-hd.mpg.de>>) escribió:
> 
>     Dear all,
>     i have a problem that jobs sent to gridengine randomly die.
>     The gridengine version is 8.1.9
>     The OS is opensuse 15.0
>     The gridengine messages file says:
>     05/13/2019 18:31:45|worker|karun|E|master task of job 635659.1 failed - killing job
>     05/13/2019 18:31:46|worker|karun|W|job 635659.1 failed on host karun10 assumedly after job because: job 635659.1 died through signal KILL (9)
> 
>     qacct -j 635659 says:
>     failed       100 : assumedly after job
>     exit_status  137                  (Killed)
> 
> 
>     The was no kill triggered by the user. Also there are no other limitations, neither ulimit nor in the gridengine queue
>     The 'qconf -sq all.q' command gives:
>     s_rt                  INFINITY
>     h_rt                  INFINITY
>     s_cpu                 INFINITY
>     h_cpu                 INFINITY
>     s_fsize               INFINITY
>     h_fsize               INFINITY
>     s_data                INFINITY
>     h_data                INFINITY
>     s_stack               INFINITY
>     h_stack               INFINITY
>     s_core                INFINITY
>     h_core                INFINITY
>     s_rss                 INFINITY
>     h_rss                 INFINITY
>     s_vmem                INFINITY
>     h_vmem                INFINITY
> 
>     Years ago there were some threads about the same issue, but i did not find a solution.
> 
>     Does somebody have a hint what i can do or check/debug?
> 
>     With kind regards and many thanks for any help, ulrich
>     _______________________________________________
>     users mailing list
>     users at gridengine.org <mailto:users at gridengine.org>
>     https://gridengine.org/mailman/listinfo/users
> 


More information about the users mailing list