[gridengine users] Command failed: ./utilbin/sol-amd64/spooldefaults configuration /tmp/configuration_2014-09-09_16:13:27.8242

Kraus, Niki Niki.Kraus at 3beg.at
Wed Sep 10 10:12:53 UTC 2014


Hi!

>Was there any other file created in /tmp which might point to the reason (containing execd or master in its name)?

no it wasn't.
This is /tmp :

! dbelkf at dbegkgrt2:/ # ls -l /tmp
total 76384
-rw-rw-r--   1 rknet    other       9745 Jul 31 14:08 KreditNetRepo.serverDefault.log
-rw-rw-r--   1 root     root      366080 Aug  6  2009 SPROsslnk
-rw-r--r--   1 root     root      698820 Sep 10 11:33 compliance_check.txt
-rw-r--r--   1 sgeadmin sgeadmin    1616 Sep  9 14:07 configuration_2014-09-09_14:07:42.28154
-rw-r--r--   1 sgeadmin sgeadmin    1616 Sep  9 14:12 configuration_2014-09-09_14:11:39.28777
-rw-r--r--   1 sgeadmin sgeadmin    1616 Sep  9 14:24 configuration_2014-09-09_14:24:58.29971
-rw-r--r--   1 sgeadmin sgeadmin    1616 Sep  9 16:13 configuration_2014-09-09_16:13:27.8242
-rw-r--r--   1 sgeadmin sgeadmin    1612 Sep 10 09:01 configuration_2014-09-10_09:01:21.14660
-rw-r--r--   1 root     sys           28 Jul 31 14:07 csn.7154
-rw-rw-r--   1 rknet    other    37928391 Aug 28 15:36 dbegkgrt2-rknet_home.gz
drwxr--r-x   2 rknet    other        179 Sep  2 11:03 hsperfdata_rknet
drwxr-xr-x   2 root     root         117 Sep 10 11:04 hsperfdata_root
-rw-r--r--   1 rknet    other         68 Aug 13 01:15 kgr_restart_0uhr10.log
-rw-r--r--   1 rknet    other         68 Aug 12 20:00 kgr_restart_20uhr.log
drwxr-xr-x   2 root     root        1686 Sep  8 17:24 nbi-4545297854873370914.tmp
-rw-r--r--   1 root     root         141 Sep  8 17:15 nbi-admin-static2071452369998445520
-rw-r--r--   1 root     root         159 Sep  8 17:18 nbi-admin1968690861715875120
-rw-r--r--   1 root     root         159 Sep  8 17:19 nbi-admin272029653310052276
-rw-r--r--   1 root     root         159 Sep  8 17:16 nbi-admin2899519607099997437
-rw-r--r--   1 root     root         159 Sep  8 17:19 nbi-admin3294315890471701827
-rw-r--r--   1 root     root         159 Sep  8 17:19 nbi-admin7236158509035343515
-rw-rw-r--   1 rknet    other      10731 Aug  4 14:42 report1407156139330.pdf
drwx------   2 root     root         185 Sep 10 11:34 ssh-zy7wcvCOGG
drwx------   2 root     root         117 Jul 31 14:06 tmp.EHaWYn
-rw-r--r--   1 root     root           7 Sep 10 11:34 user-dev-pts-1
-rw-r--r--   1 root     root           7 Aug 13 10:35 user-dev-pts-9

I deleted all files like config* and nbi*
grep'd for exec and master in /tmp (no result), and started install_qmaster again.
But I still stumble over this message:

...
Making directories
------------------

creating directory: /reuters/sge/default/spool/qmaster
creating directory: /reuters/sge/default/spool/qmaster/job_scripts
Hit <RETURN> to continue >>

Setup spooling
--------------
Dumping bootstrapping information
Initializing spooling database
error: unknown object type for list attribute "SC_job_load_adjustments" in function ""

etc.

This is a non-global zone, and /reuters/sge is shared via NFS, but that shouldn't make any difference or does it?

In my opinion it's not a permission problem, all files are owned by sgeadmin, and the install script is run as root, and furthermore sgeadmin may create files in this namespace.

-- Nick




More information about the users mailing list