[gridengine users] commlib

Reuti reuti at staff.uni-marburg.de
Sun Nov 27 12:36:48 UTC 2016


Am 27.11.2016 um 03:23 schrieb Coleman, Marcus [JRDUS Non-J&J]:

> Hi Reuti
> 
> I am not sure what I am looking for...but here is the contents of /tmp on the rebooting node
> Any outrights you can see?
> 
> [root at padme tmp]# ls -l
> total 20
> prw-rw-r--  1 mcolem19 mcolem19    0 Nov 23 22:09 jmonitor.mcolem19.37995
> prw-rw-r--  1 mcolem19 mcolem19    0 Nov 23 22:35 jmonitor.mcolem19.38497
> prw-rw-r--  1 mcolem19 mcolem19    0 Nov 23 22:45 jmonitor.mcolem19.38615
> prw-rw-r--  1 mcolem19 mcolem19    0 Nov 23 22:45 jmonitor.mcolem19.38624
> prw-rw-r--  1 schrogpu schrogpu    0 Sep  5 00:27 jmonitor.schrogpu.28331
> prw-rw-r--  1 schrogpu schrogpu    0 Sep  5 00:27 jmonitor.schrogpu.28377
> prw-rw-r--  1 schrogpu schrogpu    0 Sep  5 00:40 jmonitor.schrogpu.31781
> prw-rw-r--  1 schrogpu schrogpu    0 Sep  5 00:41 jmonitor.schrogpu.31829
> prw-rw-r--  1 schrogpu schrogpu    0 Sep  9 12:17 jmonitor.schrogpu.5042
> prw-rw-r--  1 schrogpu schrogpu    0 Sep  9 12:17 jmonitor.schrogpu.5043
> prw-rw-r--  1 schrogpu schrogpu    0 Sep  5 00:08 jmonitor.schrogpu.8041
> prw-rw-r--  1 schrogpu schrogpu    0 Sep  5 00:39 jmonitor.schrogpu.8220
> prw-rw-r--  1 schrogpu schrogpu    0 Sep  5 00:26 jmonitor.schrogpu.8346
> prw-rw-r--  1 schrogpu schrogpu    0 Sep  5 00:39 jmonitor.schrogpu.8557
> prw-rw-r--  1 schrogpu schrogpu    0 Sep  5 00:27 jmonitor.schrogpu.8740
> drwx------  2 root     root     4096 Nov  4 16:09 keyring-6CWKlB
> drwxrwxrwx  2 mcolem19 mcolem19 4096 Nov 23 11:03 mmjob.lock
> prw-------  1 schrogpu schrogpu    0 Sep  5 00:27 mmjob.schrogpu.28352
> prw-------  1 schrogpu schrogpu    0 Sep  5 00:27 mmjob.schrogpu.28400
> prw-------  1 schrogpu schrogpu    0 Sep  5 00:27 mmjob.schrogpu.28480
> prw-------  1 schrogpu schrogpu    0 Sep  5 00:27 mmjob.schrogpu.28487
> prw-------  1 schrogpu schrogpu    0 Sep  5 00:39 mmjob.schrogpu.31802
> prw-------  1 schrogpu schrogpu    0 Sep  5 00:39 mmjob.schrogpu.31850
> prw-------  1 schrogpu schrogpu    0 Sep  5 00:40 mmjob.schrogpu.31876
> prw-------  1 schrogpu schrogpu    0 Sep  5 00:41 mmjob.schrogpu.31891
> prw-------  1 schrogpu schrogpu    0 Sep  5 00:08 mmjob.schrogpu.8087
> prw-------  1 schrogpu schrogpu    0 Sep  5 00:39 mmjob.schrogpu.8266
> prw-------  1 schrogpu schrogpu    0 Sep  5 00:26 mmjob.schrogpu.8392
> prw-------  1 schrogpu schrogpu    0 Sep  5 00:39 mmjob.schrogpu.8603
> prw-------  1 schrogpu schrogpu    0 Sep  5 00:27 mmjob.schrogpu.8787
> drwx------  2 gdm      gdm      4096 Nov 25 07:42 orbit-gdm
> drwx------. 2 gdm      gdm      4096 Nov 25 07:42 pulse-5mlDwNemaGym
> drwx------  2 root     root     4096 Nov  4 16:09 pulse-GAI9xhuCTgeg

Thx, I was looking for a file created by the execd in case it faces problems during startup. Such files will be saved in /tmp as last resort for the logfiles. Unfortunately there are none, hence the startup per se was successful.


> [root at padme tmp]#
> 
> 
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de] 
> Sent: Saturday, November 26, 2016 6:31 AM
> To: Coleman, Marcus [JRDUS Non-J&J]
> Cc: users at gridengine.org
> Subject: [EXTERNAL] Re: [gridengine users] commlib
> 
> Hi,
> 
> Am 26.11.2016 um 06:10 schrieb Coleman, Marcus [JRDUS Non-J&J]:
> 
>> I am having an issue with a node rebooting. I am running Desmond fep 
>> jobs...
>> 
>> Thanks for any help in advance!
>> 
>> /etc/resolv.conf is the same on all nodes /etc/hosts is the same on 
>> all nodes All nodes are connected to the same switch in a server rack.
>> ################### from NODE
>> [root at padme lx-amd64]# ./gethostbyaddr -name 192.168.1.8 
>> rndusljpp2.na.jnj.com [root at padme lx-amd64]# ./gethostbyname -name s1 
>> rndusljpp2.na.jnj.com ################### from QMASTER
>> [root at rndusljpp2 lx-amd64]# ./gethostbyaddr -name 192.168.1.159 padme
>> [root at rndusljpp2 lx-amd64]# ./gethostbyname -name padme padme

What do:

$ ./gethostbyname -all padme
$ ./gethostbyaddr -all 192.168.1.159

show?

-- Reuti



More information about the users mailing list