[gridengine users] Jobs dies with signal BUS (7)

William Hay w.hay at ucl.ac.uk
Fri Feb 19 08:40:24 UTC 2016


On Fri, Feb 19, 2016 at 07:11:52AM +0000, sudha.penmetsa at wipro.com wrote:
>    Hi,
> 
>     
> 
>    My job gets aborted after a while with exit status 135
> 
>     
> 
>    failed       100 : assumedly after job
> 
>    exit_status  135
> 
>     
> 
>    02/17/2016 11:25:10|qmaster|master1|W|job 428284.1 failed on host test1
>    assumedly after job because: job 428284.1 died through signal BUS (7)
> 
>     
> 
>    Job submitted with same resources sometimes get succeeded.
> 
>     
> 
>    I tried to increase the h_vmem size and submit again but I face the same
>    result,
> 
>     
> 
>    Can you please help me in finding the reason for this kind of behavior.
> 
>     
>
I don't think SIGBUS is likely to be grid engine related per se.
A quick google suggests that one possible cause of SIGBUS errors is
trying to access part of an mmap'd file that no longer exists because
the file has been truncated since it was mmap'd.

This is the sort of thing that would be more likely to show up on a cluster
where you might have multiple copies of a program running that are all 
manipulating the same file on a shared file system.

If this is the case and you can identify the problem file you might
be able to avoid the error by working on a private copy of the file 
rather than a shared one.  

William
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://gridengine.org/pipermail/users/attachments/20160219/e6acaed9/attachment.sig>


More information about the users mailing list