[gridengine users] adding a network to gridengine master

Simon Hood simon.hood at manchester.ac.uk
Wed Nov 21 22:17:39 UTC 2012





On Wed, Nov 21, 2012 at 05:01:19PM +0000, Reuti wrote:
> Am 21.11.2012 um 15:43 schrieb Simon Hood:
> 
> > Hi All,
> > 
> > I need to add a network to my Gridengine master host, which already has three networks.
> > But when I stop sge_master, add the network and try to start, it won't restart.  
> 
> Any error message - any file in /tmp showing the error? Which one do you get?

Hi Reuti, Dave,

thanks for the questions/replies.  /tmp/sge_messages and SGE_ROOT/default/spool/qmaster/messages
contained only that appended below --- nothing which gives me a clue.  

But having left the problem for at 16:00, gone out for several beers and returned now, 22:00,
the solution suddenly shouts at me.  Firewall!!!  The new network is on eth0 which previously
had a different network on it.  The new network IP was blocked at the "lo" interface.  Apparently
SGE likes traffic to/from all IPs on a host to be enabled through "lo".  

Forgive my total paranoia re the firewall.  But many years experience suggests allowing 
exactly what is required and no more or less is a good way to catch our naughty postgrads
out at attempted mischief.

All working now.

Simon


================================

 -- /tmp/sge_messages contained only, for example

    11/21/2012 21:45:51|  main|login|E|communication error for "login.redqueen.rcs.manchester.ac.uk/qmaster/1" running on port 6444: "can't bind socket"
    11/21/2012 21:45:52|  main|login|E|commlib error: can't bind socket (no additional information available)
    11/21/2012 21:46:20|  main|login|C|abort qmaster startup due to communication errors

 -- SGE_ROOT/default/spool/qmaster/messages contained only
  
    11/21/2012 22:02:31|  main|login|I|starting up GE 6.2 (lx24-amd64)
    11/21/2012 22:03:12|  main|login|E|jvm thread is not running
    11/21/2012 22:03:18|  main|login|I|controlled shutdown 6.2
    11/21/2012 22:03:45|  main|login|I|read job database with 18 entries in 0 seconds
    11/21/2012 22:03:45|  main|login|I|qmaster hard descriptor limit is set to 8192
    11/21/2012 22:03:45|  main|login|I|qmaster soft descriptor limit is set to 8192
    11/21/2012 22:03:45|  main|login|I|qmaster will use max. 8172 file descriptors for communication
    11/21/2012 22:03:45|  main|login|I|qmaster will accept max. 99 dynamic event clients

================================



 
> -- Reuti
> 
> 
> > In /etc/hosts:
> > 
> > 127.0.0.1       localhost.localdomain localhost
> > #
> > ::1             localhost6.localdomain6 localhost6
> > #
> > 10.99.203.190   test.manchester.ac.uk  test
> > #
> > 10.2.49.100     login-stg.test.manchester.ac.uk  login-stg
> > #
> > 10.2.2.250      login.test.manchester.ac.uk login
> > 10.3.3.250      login-3.test.manchester.ac.uk login-3
> > 
> > 
> > In host_aliases:
> > 
> > login.test.manchester.ac.uk login login-3.test.manchester.ac.uk login-3 login-stg.test.manchester.ac.uk login-stg test.manchester.ac.uk test
> > 
> > 
> > Some tests:
> > 
> > hostname 
> > login.test.manchester.ac.uk
> > 
> > hostname -f
> > login.test.manchester.ac.uk
> > 
> > 
> > root at test>  ./gethostbyname -aname login-3
> > login.test.manchester.ac.uk
> > root at test>  ./gethostbyname -aname login-3.test.manchester.ac.uk
> > login.test.manchester.ac.uk
> > root at test>  ./gethostbyname -aname login.test.manchester.ac.uk
> > login.test.manchester.ac.uk
> > root at test>  ./gethostbyname -aname login
> > login.test.manchester.ac.uk
> > root at test>  ./gethostbyname -aname test
> > login.test.manchester.ac.uk
> > root at test>  ./gethostbyname -aname test.manchester.ac.uk
> > login.test.manchester.ac.uk
> > root at test>  ./gethostbyname -aname login-stg.test.manchester.ac.uk
> > login.test.manchester.ac.uk
> > root at test>  ./gethostbyname -aname login-stg
> > login.test.manchester.ac.uk
> > 
> > 
> > root at test>  ./gethostbyaddr -aname 10.99.203.190
> > login.test.manchester.ac.uk
> > root at test>  ./gethostbyaddr -aname 10.2.2.250
> > login.test.manchester.ac.uk
> > root at test>  ./gethostbyaddr -aname 10.3.3.250
> > login.test.manchester.ac.uk
> > root at test>  ./gethostbyaddr -aname 10.2.49.100
> > login.test.manchester.ac.uk
> > 
> > 
> > What am I missing?
> > 
> > Cheers
> > 
> > Simon
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > -- 
> > 
> > 
> > _______________________________________________
> > users mailing list
> > users at gridengine.org
> > https://gridengine.org/mailman/listinfo/users
> 
> 

-- 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 486 bytes
Desc: not available
URL: <http://gridengine.org/pipermail/users/attachments/20121121/f863c2cb/attachment.sig>


More information about the users mailing list