[gridengine users] SoGE 8.1.2 segfault problem

Loong, Andreas Andreas.Loong at astrazeneca.com
Wed Nov 14 14:40:32 UTC 2012


> > My fault for not being more clear. It starts up just fine and
> everything
> > I'd expect works just as our old qmaster did. After approx 2
> minutes it
> > segfaults without anything new or even odd added to the messages
> file.
> 
> It sounds most likely to be due to a load report arriving, but I've
> no
> idea what might be wrong.  It's rather odd with a clean install on
> Red
> Hat.  Is the installation from the RPMs I made, or a separate build?

They're from the RPMs provided on the SoGE site (arc.liv.. ) - speaking
of RPMs - have you tested the debug package of the qmaster? I can not
get it to run at all. Look at this:

file sge_qmaster*
sge_qmaster:       ELF 64-bit LSB executable, AMD x86-64, version 1
(SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs),
stripped
sge_qmaster.debug: ELF 64-bit LSB executable, AMD x86-64, version 1
(SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, not
stripped

ldd sge_qmaster.debug 
        statically linked
./sge_qmaster.debug 
-bash: ./sge_qmaster.debug: bad ELF interpreter: No such file or
directory

What's going on here? 'file' says it uses shared libs, ldd says it
doesn't, and it won't run?
 
> >> > As soon as I change back from pure files to "files dns" it
> takes
> >> > 2-3 minutes and the qmaster segfaults again.
> >>
> >> Do you mean qmaster runs for that long, or the init script waits
> >> that
> >> long for it?  What do you get with and without dns in NSS and
> >> flushing
> >> the nscd cache from
> >>
> >>   utilbin/lx-amd64/gethostbyname -all srvname
> >
> > I tried as many options I could think of to get differing results,
> but
> > the output never changed.
> 
> There must be something different between them if it affects the
> qmaster, but I don't have any good ideas what to try.  There have
> been
> changes made for problems with NSS-style lookup, but I'm surprised
> if
> any of that has caused trouble, and I don't remember any being
> specifically host-related.

This doesn't have to be NSS related though, does it? I'm guessing here,
but if I configure it to use only files, wouldn't this affect what/how
much it does? What I'm getting is that perhaps we'll still run into the
problem, but it's triggered much less often.

> > Next, I'll try the GDB approach.
> 
> If you can send me a backtrace from that, I hope it will indicate
> what's wrong.

Got a stacktrace from another user running SuSE, I've forwarded it and
got ticket #1441 in trac.

Wbr
Andreas

--------------------------------------------------------------------------
Confidentiality Notice: This message is private and may contain confidential and proprietary information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this message is not permitted and may be unlawful.
 



More information about the users mailing list