[gridengine users] commlib

Coleman, Marcus [JRDUS Non-J&J] mcolem19 at its.jnj.com
Mon Nov 28 20:10:22 UTC 2016


Master identifying node
[root at rndusljpp2 lx-amd64]# ./gethostbyname -all padme
Hostname: padme
SGE name: padme
Aliases: padme
Host Address(es): 192.168.1.159
[root at rndusljpp2 lx-amd64]# ./gethostbyaddr -all 192.168.1.159
Hostname: padme
SGE name: padme
Aliases: padme
Host Address(es): 192.168.1.159
[root at rndusljpp2 lx-amd64]#

 Node identifying self
[root at padme lx-amd64]# ./gethostbyname -all padme
Hostname: padme
SGE name: padme
Aliases: padme
Host Address(es): 192.168.1.159
[root at padme lx-amd64]# ./gethostbyaddr -all 192.168.1.159
Hostname: padme
SGE name: padme
Aliases: padme
Host Address(es): 192.168.1.159

Node identifying master:
[root at padme lx-amd64]# ./gethostbyname -all s1
Hostname: rndusljpp2.na.jnj.com
SGE name: rndusljpp2.na.jnj.com
Aliases: s1 rndusljpp2
Host Address(es): 192.168.1.8
[root at padme lx-amd64]# ./gethostbyname -all 192.168.1.8
Hostname: rndusljpp2.na.jnj.com
SGE name: rndusljpp2.na.jnj.com
Aliases: s1 rndusljpp2
Host Address(es): 192.168.1.8


Master identifying self
[root at rndusljpp2 lx-amd64]# ./gethostbyaddr -all 192.168.1.8
Hostname: rndusljpp2.na.jnj.com
SGE name: rndusljpp2.na.jnj.com
Aliases: s1 rndusljpp2
Host Address(es): 192.168.1.8
[root at rndusljpp2 lx-amd64]# ./gethostbyname -all s1
Hostname: rndusljpp2.na.jnj.com
SGE name: rndusljpp2.na.jnj.com
Aliases: s1 rndusljpp2
Host Address(es): 192.168.1.8


-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: Monday, November 28, 2016 11:55 AM
To: Coleman, Marcus [JRDUS Non-J&J]
Cc: users at gridengine.org
Subject: [EXTERNAL] Re: [gridengine users] commlib


Am 28.11.2016 um 20:36 schrieb Coleman, Marcus [JRDUS Non-J&J]:

> Thanks Reuti! 
> 
> I was hoping it was something there....Any ideas on where to go from here?

What do:

$ ./gethostbyname -all padme
$ ./gethostbyaddr -all 192.168.1.159

show on the node and headnode?

-- Reuti


> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Sunday, November 27, 2016 4:37 AM
> To: Coleman, Marcus [JRDUS Non-J&J]
> Cc: users at gridengine.org
> Subject: [EXTERNAL] Re: [gridengine users] commlib
> 
> 
> Am 27.11.2016 um 03:23 schrieb Coleman, Marcus [JRDUS Non-J&J]:
> 
>> Hi Reuti
>> 
>> I am not sure what I am looking for...but here is the contents of 
>> /tmp on the rebooting node Any outrights you can see?
>> 
>> [root at padme tmp]# ls -l
>> total 20
>> prw-rw-r--  1 mcolem19 mcolem19    0 Nov 23 22:09 jmonitor.mcolem19.37995
>> prw-rw-r--  1 mcolem19 mcolem19    0 Nov 23 22:35 jmonitor.mcolem19.38497
>> prw-rw-r--  1 mcolem19 mcolem19    0 Nov 23 22:45 jmonitor.mcolem19.38615
>> prw-rw-r--  1 mcolem19 mcolem19    0 Nov 23 22:45 jmonitor.mcolem19.38624
>> prw-rw-r--  1 schrogpu schrogpu    0 Sep  5 00:27 jmonitor.schrogpu.28331
>> prw-rw-r--  1 schrogpu schrogpu    0 Sep  5 00:27 jmonitor.schrogpu.28377
>> prw-rw-r--  1 schrogpu schrogpu    0 Sep  5 00:40 jmonitor.schrogpu.31781
>> prw-rw-r--  1 schrogpu schrogpu    0 Sep  5 00:41 jmonitor.schrogpu.31829
>> prw-rw-r--  1 schrogpu schrogpu    0 Sep  9 12:17 jmonitor.schrogpu.5042
>> prw-rw-r--  1 schrogpu schrogpu    0 Sep  9 12:17 jmonitor.schrogpu.5043
>> prw-rw-r--  1 schrogpu schrogpu    0 Sep  5 00:08 jmonitor.schrogpu.8041
>> prw-rw-r--  1 schrogpu schrogpu    0 Sep  5 00:39 jmonitor.schrogpu.8220
>> prw-rw-r--  1 schrogpu schrogpu    0 Sep  5 00:26 jmonitor.schrogpu.8346
>> prw-rw-r--  1 schrogpu schrogpu    0 Sep  5 00:39 jmonitor.schrogpu.8557
>> prw-rw-r--  1 schrogpu schrogpu    0 Sep  5 00:27 jmonitor.schrogpu.8740
>> drwx------  2 root     root     4096 Nov  4 16:09 keyring-6CWKlB
>> drwxrwxrwx  2 mcolem19 mcolem19 4096 Nov 23 11:03 mmjob.lock
>> prw-------  1 schrogpu schrogpu    0 Sep  5 00:27 mmjob.schrogpu.28352
>> prw-------  1 schrogpu schrogpu    0 Sep  5 00:27 mmjob.schrogpu.28400
>> prw-------  1 schrogpu schrogpu    0 Sep  5 00:27 mmjob.schrogpu.28480
>> prw-------  1 schrogpu schrogpu    0 Sep  5 00:27 mmjob.schrogpu.28487
>> prw-------  1 schrogpu schrogpu    0 Sep  5 00:39 mmjob.schrogpu.31802
>> prw-------  1 schrogpu schrogpu    0 Sep  5 00:39 mmjob.schrogpu.31850
>> prw-------  1 schrogpu schrogpu    0 Sep  5 00:40 mmjob.schrogpu.31876
>> prw-------  1 schrogpu schrogpu    0 Sep  5 00:41 mmjob.schrogpu.31891
>> prw-------  1 schrogpu schrogpu    0 Sep  5 00:08 mmjob.schrogpu.8087
>> prw-------  1 schrogpu schrogpu    0 Sep  5 00:39 mmjob.schrogpu.8266
>> prw-------  1 schrogpu schrogpu    0 Sep  5 00:26 mmjob.schrogpu.8392
>> prw-------  1 schrogpu schrogpu    0 Sep  5 00:39 mmjob.schrogpu.8603
>> prw-------  1 schrogpu schrogpu    0 Sep  5 00:27 mmjob.schrogpu.8787
>> drwx------  2 gdm      gdm      4096 Nov 25 07:42 orbit-gdm
>> drwx------. 2 gdm      gdm      4096 Nov 25 07:42 pulse-5mlDwNemaGym
>> drwx------  2 root     root     4096 Nov  4 16:09 pulse-GAI9xhuCTgeg
> 
> Thx, I was looking for a file created by the execd in case it faces problems during startup. Such files will be saved in /tmp as last resort for the logfiles. Unfortunately there are none, hence the startup per se was successful.
> 
> 
>> [root at padme tmp]#
>> 
>> 
>> -----Original Message-----
>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>> Sent: Saturday, November 26, 2016 6:31 AM
>> To: Coleman, Marcus [JRDUS Non-J&J]
>> Cc: users at gridengine.org
>> Subject: [EXTERNAL] Re: [gridengine users] commlib
>> 
>> Hi,
>> 
>> Am 26.11.2016 um 06:10 schrieb Coleman, Marcus [JRDUS Non-J&J]:
>> 
>>> I am having an issue with a node rebooting. I am running Desmond fep 
>>> jobs...
>>> 
>>> Thanks for any help in advance!
>>> 
>>> /etc/resolv.conf is the same on all nodes /etc/hosts is the same on 
>>> all nodes All nodes are connected to the same switch in a server rack.
>>> ################### from NODE
>>> [root at padme lx-amd64]# ./gethostbyaddr -name 192.168.1.8 
>>> rndusljpp2.na.jnj.com [root at padme lx-amd64]# ./gethostbyname -name 
>>> s1 rndusljpp2.na.jnj.com ################### from QMASTER
>>> [root at rndusljpp2 lx-amd64]# ./gethostbyaddr -name 192.168.1.159 
>>> padme
>>> [root at rndusljpp2 lx-amd64]# ./gethostbyname -name padme padme
> 
> What do:
> 
> $ ./gethostbyname -all padme
> $ ./gethostbyaddr -all 192.168.1.159
> 
> show?
> 
> -- Reuti
> 





More information about the users mailing list