[gridengine users] FYI: SIGABRT in PE start_proc_args
reuti at staff.uni-marburg.de
Mon Jan 9 16:00:25 UTC 2012
I just upgraded an older cluster with SGE6.2u5 and all nodes couldn't run parallel jobs any longer, as they always got a SIGABRT in PE start. This was on the list a couple of times, and so I post my findings here. Before it ran SGE6.1u3 without any problems.
The OS on the nodes was openSUSE 11.3 with kernel 188.8.131.52-0.5-default. As I never have seen this behavior on my own before, I upgraded the OS to the latest available patches of openSUSE 11.3 which changed the kernel to be 184.108.40.206-0.4-default and guess what: it's working again.
I don't have the source for SGE6.1u3 but to me it looks like:
- Some versions of Linux will send a SIGBART for unknown reasons when the shepherd starts
- SGE6.1u3 ignored the signal in the shepherd
More information about the users