[gridengine users] SoGE 8.1.2 segfault problem

Dave Love d.love at liverpool.ac.uk
Wed Nov 14 21:30:39 UTC 2012

"Loong, Andreas" <Andreas.Loong at astrazeneca.com> writes:

>> It sounds most likely to be due to a load report arriving, but I've
>> no
>> idea what might be wrong.  It's rather odd with a clean install on
>> Red
>> Hat.  Is the installation from the RPMs I made, or a separate build?
> They're from the RPMs provided on the SoGE site (arc.liv.. ) - speaking
> of RPMs - have you tested the debug package of the qmaster?

Yes, mainly for clients.

> I can not
> get it to run at all. Look at this:
> file sge_qmaster*
> sge_qmaster:       ELF 64-bit LSB executable, AMD x86-64, version 1
> (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs),
> stripped
> sge_qmaster.debug: ELF 64-bit LSB executable, AMD x86-64, version 1
> (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, not
> stripped
> ldd sge_qmaster.debug 
>         statically linked
> ./sge_qmaster.debug 
> -bash: ./sge_qmaster.debug: bad ELF interpreter: No such file or
> directory
> What's going on here? 'file' says it uses shared libs, ldd says it
> doesn't, and it won't run?

You shouldn't be trying to run that -- it's just debugging info, per the
package description.  Just install -debuginfo packages with rpm/yum and
then gdb will find the symbols for the corresponding executables.  I
assume Red Hat document it somewhere, but the site currently seems to be

>> There must be something different between them if it affects the
>> qmaster, but I don't have any good ideas what to try.  There have
>> been
>> changes made for problems with NSS-style lookup, but I'm surprised
>> if
>> any of that has caused trouble, and I don't remember any being
>> specifically host-related.
> This doesn't have to be NSS related though, does it?

Well if that's all you change and it consistently causes the problem,
I'd say yes.

> I'm guessing here,
> but if I configure it to use only files, wouldn't this affect what/how
> much it does? What I'm getting is that perhaps we'll still run into the
> problem, but it's triggered much less often.

Maybe, but it clearly requires debugging the problem to find out what's
different about your environment.  A backtrace from the optimized binary
may not be so useful, but I could build an alternative rpm which may be
more useful if necessary.

Community Grid Engine:  http://arc.liv.ac.uk/SGE/

More information about the users mailing list