[gridengine users] file permission change on $TMP directory

Rayson Ho rayrayson at gmail.com
Wed Jun 27 17:44:24 UTC 2012


On Wed, Jun 27, 2012 at 1:26 PM, Reuti <reuti at staff.uni-marburg.de> wrote:
> The job scripts need to be readable by each individual user who is running the job at execution time. What permission did you put on the execd's spool directory? For binary jobs it will work (unless you want to use $PE_HOSTFILE or alike).

Yes, good point. I did not test it with a full job, only ran a quick &
dirty "qrsh -b y sleep 10" (so that I could see if it complains or
not).

Actually, binary jobs also do not work (error messages are not
displayed, but it is not run properly). I believe that some of the job
setup in shepherd is run as the user (no time to sit down right now
and verify that...).

Rayson



>
> -- Reuti
>
>
>> Rayson
>>
>>
>>
>>
>>
>> On Wed, Jun 27, 2012 at 12:51 PM, CB <cbalways at gmail.com> wrote:
>>> I was able to figure out how to fix the errors shown below. With
>>> implementing Rayson and Dave's recommendation, I was able to harden file
>>> permission on job owner's spooled files as well as $TMP.  The one last ToDo
>>> is the trace file, which is owned by the job owner and it still has
>>> world-readable permission.....
>>>
>>> For those who are interested in how I fixed the errors below, I added the
>>> diff result of reaper_execd.c file:
>>>
>>> Fixed issue: abnormal termination of shepherd for job 6.1: no "exit_status"
>>> file
>>>
>>> Fixed issue: can't open file active_jobs/6.1/error: Permission denied
>>>
>>> Index: daemons/execd/reaper_execd.c
>>>
>>> ===================================================================
>>>
>>> --- daemons/execd/reaper_execd.c      (revision 5)
>>>
>>> +++ daemons/execd/reaper_execd.c      (working copy)
>>>
>>> @@ -498,6 +498,7 @@
>>>
>>>      */
>>>
>>>     sge_get_active_job_file_path(&fname,
>>>
>>>                                  job_id, ja_task_id, pe_task_id,
>>> "exit_status");
>>>
>>> +   sge_switch2start_user();
>>>
>>>     if (!(fp = fopen(sge_dstring_get_string(&fname), "r"))) {
>>>
>>>        /*
>>>
>>>         * we trust the exit status of the shepherd if it exited regularly
>>>
>>> @@ -509,6 +510,7 @@
>>>
>>>        else
>>>
>>>           failed = SSTATE_BEFORE_PROLOG;
>>>
>>>
>>>
>>> +      sge_switch2admin_user();
>>>
>>>        sprintf(error, MSG_STATUS_ABNORMALTERMINATIONOFSHEPHERDFORJOBXY_S,
>>>
>>>                job_get_id_string(job_id, ja_task_id, pe_task_id,
>>> &id_dstring));
>>>
>>>        ERROR((SGE_EVENT, error));
>>>
>>> @@ -521,6 +523,7 @@
>>>
>>>        int fscanf_count, shepherd_exit_status_file;
>>>
>>>
>>>
>>>        fscanf_count = fscanf(fp, "%d", &shepherd_exit_status_file);
>>>
>>> +      sge_switch2admin_user();
>>>
>>>        FCLOSE_IGNORE_ERROR(fp);
>>>
>>>        if (fscanf_count != 1) {
>>>
>>>           sprintf(error,
>>> MSG_STATUS_ABNORMALTERMINATIONFOSHEPHERDFORJOBXYEXITSTATEFILEISEMPTY_S,
>>>
>>> @@ -564,6 +567,7 @@
>>>
>>>     /* look for error file this overrules errors found yet */
>>>
>>>     sge_get_active_job_file_path(&fname,
>>>
>>>                                  job_id, ja_task_id, pe_task_id, "error");
>>>
>>> +   sge_switch2start_user();
>>>
>>>     if ((fp = fopen(sge_dstring_get_string(&fname), "r"))) {
>>>
>>>        int n;
>>>
>>>        char *new_line;
>>>
>>> @@ -575,17 +579,21 @@
>>>
>>>           /* ensure only first line of error file is in 'error' */
>>>
>>>           if ((new_line=strchr(error, '\n')))
>>>
>>>              *new_line = '\0';
>>>
>>> +         sge_switch2admin_user();
>>>
>>>           DPRINTF(("ERRORFILE: %256s\n", error));
>>>
>>>        }
>>>
>>>        else if (feof(fp)) {
>>>
>>> +         sge_switch2admin_user();
>>>
>>>           DPRINTF(("empty error file\n"));
>>>
>>>        } else {
>>>
>>> +         sge_switch2admin_user();
>>>
>>>           ERROR((SGE_EVENT, MSG_JOB_CANTREADERRORFILEFORJOBXY_S,
>>>
>>>              job_get_id_string(job_id, ja_task_id, pe_task_id,
>>> &id_dstring)));
>>>
>>>        }
>>>
>>>        FCLOSE_IGNORE_ERROR(fp);
>>>
>>>     }
>>>
>>>     else {
>>>
>>> +      sge_switch2admin_user();
>>>
>>>        ERROR((SGE_EVENT, MSG_FILE_NOOPEN_SS, sge_dstring_get_string(&fname),
>>> strerror(errno)));
>>>
>>>        /* There is no error file. */
>>>
>>>     }
>>>
>>>
>>> Regards,
>>> - Chansup
>>>
>>> On Fri, Jun 22, 2012 at 10:59 AM, CB <cbalways at gmail.com> wrote:
>>>>
>>>> I tried the workaround suggestion in the ticket but it failed when a job
>>>> exited, which failed to update the error state file in the spool directory.
>>>> By using umask(027) instead of umask(022), it changes file permission on
>>>> some of the files in the execd spool directory, which are owned by the job
>>>> owner.
>>>>
>>>> Interestingly not all of them are affected by umask(027) as shown below:
>>>>
>>>> [CH21778 at d-7-55 d-7-55]$ pwd
>>>> /opt/llogs/default/spool/d-7-55
>>>> [CH21778 at d-7-55 d-7-55]$ find -ls
>>>> 1627040    4 drwxr-xr-x   5 sge      sge          4096 Jun 22 09:34 .
>>>> 1627042    4 drwxr-xr-x   2 sge      sge          4096 Jun 22 10:49 ./jobs
>>>> 1627043    4 drwxr-xr-x   7 sge      sge          4096 Jun 22 10:48
>>>> ./active_jobs
>>>> 1627050    4 drwxr-xr-x   2 sge      sge          4096 Jun 22 10:49
>>>> ./active_jobs/6.1
>>>> 1627056    4 -rw-r--r--   1 sge      sge          2063 Jun 22 10:48
>>>> ./active_jobs/6.1/environment
>>>> 1627074    4 -rw-r--r--   1 sge      sge             6 Jun 22 10:48
>>>> ./active_jobs/6.1/pid
>>>> 1627053    4 -rw-r--r--   1 CH21778  CH21778      3498 Jun 22 10:49
>>>> ./active_jobs/6.1/trace
>>>> 1627078    4 -rw-r--r--   1 sge      sge             6 Jun 22 10:48
>>>> ./active_jobs/6.1/job_pid
>>>> 1627086    4 -rw-r-----   1 sge      sge             6 Jun 22 10:48
>>>> ./active_jobs/6.1/addgrpid
>>>> 1627105    0 -rw-r-----   1 CH21778  CH21778         0 Jun 22 10:48
>>>> ./active_jobs/6.1/error
>>>> 1627048    4 -rw-r--r--   1 sge      sge           305 Jun 22 10:49
>>>> ./active_jobs/6.1/usage
>>>> 1627055    4 -rw-r--r--   1 sge      sge            32 Jun 22 10:48
>>>> ./active_jobs/6.1/pe_hostfile
>>>> 1627061    4 -rw-r--r--   1 sge      sge          1902 Jun 22 10:48
>>>> ./active_jobs/6.1/config
>>>> 1627106    4 -rw-r-----   1 CH21778  CH21778         2 Jun 22 10:49
>>>> ./active_jobs/6.1/exit_status
>>>>
>>>> And then, at the end of job execution, it tried to update the error file
>>>> but failed due to file permission as recorded in the execd messages file:
>>>>
>>>> 06/22/2012 10:49:26|  main|d-7-55|E|abnormal termination of shepherd for
>>>> job 6.1: no "exit_status" file
>>>> 06/22/2012 10:49:26|  main|d-7-55|E|can't open file active_jobs/6.1/error:
>>>> Permission denied
>>>>
>>>> So it appears that the error and exit_status files are updated later by
>>>> the GE admin user (sge) and failed because of the file permission.
>>>> Any suggestions?
>>>>
>>>> Regards,
>>>> - Chansup
>>>>
>>>> On Thu, Jun 21, 2012 at 6:15 AM, Dave Love <d.love at liverpool.ac.uk> wrote:
>>>>>
>>>>> CB <cbalways at gmail.com> writes:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am using the GE2011.11 release.
>>>>>>
>>>>>> When a job dispatched to a node, it creates $TMP directory, which is
>>>>>> usually located at /tmp on the execution host. The current file
>>>>>> permission
>>>>>> on $TMP is 755.  I would like to modify it to 750.  Can anyone point me
>>>>>> which file should I modify?   I thought this might be quicker than me
>>>>>> to
>>>>>> searching through the source code.
>>>>>
>>>>> https://arc.liv.ac.uk/trac/SGE/ticket/109
>>>>>
>>>>> The relevant code is actually in sge_exec_job (in recent versions?).  I
>>>>> haven't got round to seeing if configuring the various umasks will break
>>>>> anything, particularly if it's controlled by a single parameter.  (The
>>>>> permission on the job spool is actually the most interesting.)
>>>>>
>>>>> --
>>>>> Community Grid Engine:  http://arc.liv.ac.uk/SGE/
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users at gridengine.org
>>> https://gridengine.org/mailman/listinfo/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> users at gridengine.org
>> https://gridengine.org/mailman/listinfo/users
>>
>




More information about the users mailing list