[gridengine users] Start jobs on exec host in sequential order

Derrick Lin klin938 at gmail.com
Wed Aug 1 01:06:19 UTC 2018


HI Reuti,

The prolog script is set to run by root indeed. The xfs quota requires root
privilege.

I also tried the 2nd approach but it seems that the addgrpid file has not
been created when the prolog script executed:

/opt/gridengine/default/common/prolog_exec.sh: line 21:
/opt/gridengine/default/spool/omega-1-27/active_jobs/1187086.1/addgrpid: No
such file or directory

Maybe some of my scheduler conf is not correct?

Regards,
Derrick

On Mon, Jul 30, 2018 at 7:35 PM, Reuti <reuti at staff.uni-marburg.de> wrote:

>
> > Am 30.07.2018 um 02:31 schrieb Derrick Lin <klin938 at gmail.com>:
> >
> > Hi Reuti,
> >
> > The approach sounds great.
> >
> > But the prolog script seems to be run by root, so this is what I got:
> >
> > XFS_PROJID:uid=0(root) gid=0(root) groups=0(root),396(sfcb)
>
> This is quite unusual. Do you run the prolog as root by intention? I
> assume so to set the limits:
>
> $ qconf -sq my.q
> …prolog                /some/script
>
> Do you have here "root:" to change the user (in the global `qconf -sconf`)
> under which it is run? Please note that this my open some root doors,
> depending on environment variable setting. I have here "sgeadmin:" for some
> special handling and use:
>
> sgeadmin@/usr/sge/cluster/busybox env -u LD_LIBRARY_PATH -u LD_PRELOAD -u
> IFS /usr/sge/cluster/context.sh
>
> Nevertheless: the second approach to get the additional group ID from the
> job's spool area should work.
>
> -- Reuti
>
>
> >
> > Maybe I am still missing something or prolog script is the wrong place
> for getting the group ID generated by SGE?
> >
> > Cheers,
> > D
> >
> > On Sat, Jul 28, 2018 at 11:53 AM, Reuti <reuti at staff.uni-marburg.de>
> wrote:
> >
> > > Am 28.07.2018 um 03:00 schrieb Derrick Lin <klin938 at gmail.com>:
> > >
> > > Thanks Reuti,
> > >
> > > I know little about group ID created by SGE, and also pretty much
> confused with the Linux group ID.
> >
> > Yes, SGE assigns a conventional group ID to each job to track the CPU
> and memory consumption. This group ID is in the range you defined in:
> >
> > $ qconf -sconf
> > …
> > gid_range                    20000-20100
> >
> > and this will be unique per node. First approach could be either `sed`:
> >
> > $ id
> > uid=25000(reuti) gid=25000(ourgroup) groups=25000(ourgroup),10(
> wheel),1000(operator),20052,24000(common),26000(anothergroup)
> > $ id | sed -e "s/.*),\([0-9]*\),.*/\1/"
> > 20052
> >
> > or:
> >
> > ADD_GRP_ID=$(< $SGE_JOB_SPOOL_DIR/addgrpid)
> > echo $ADD_GRP_ID
> >
> > -- Reuti
> >
> >
> > > I assume that "ïd" is called inside the prolog script, typically what
> the output looks like?
> > >
> > > Cheers,
> > >
> > > On Fri, Jul 27, 2018 at 4:12 PM, Reuti <reuti at staff.uni-marburg.de>
> wrote:
> > >
> > > Am 27.07.2018 um 03:14 schrieb Derrick Lin:
> > >
> > > > We are using $JOB_ID as xfs_projid at the moment, but this approach
> introduces problem to array jobs whose tasks have the same $JOB_ID (with
> different $TASK_ID).
> > > >
> > > > Also it is possible that tasks from two different array jobs run on
> the same node contain the same $TASK_ID, thus the uniqueness of the
> $TASK_ID on the same host cannot be maintained.
> > >
> > > So the number you are looking for needs to be unique per node only?
> > >
> > > What about using the additional group ID then which SGE creates – this
> will be unique per node.
> > >
> > > This can be found in the `id` command's output or in location of the
> spool directory for the execd_spool_dir in ${HOSTNAME}/active_jobs/${JOB_
> ID}.${TASK_ID}/addgrpid
> > >
> > > -- Reuti
> > >
> > >
> > > > That's why I am trying to implement the xfs_projid to be independent
> from SGE.
> > > >
> > > >
> > > >
> > > > On Thu, Jul 26, 2018 at 9:27 PM, Reuti <reuti at staff.uni-marburg.de>
> wrote:
> > > > Hi,
> > > >
> > > > > Am 26.07.2018 um 06:01 schrieb Derrick Lin <klin938 at gmail.com>:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > I am working on a prolog script which setup xfs quota on disk
> space per job basis.
> > > > >
> > > > > For setting up xfs quota in sub directory, I need to provide
> project ID.
> > > > >
> > > > > Here is how I did for generating project ID:
> > > > >
> > > > > XFS_PROJID_CF="/tmp/xfs_projid_counter"
> > > > >
> > > > > echo $JOB_ID >> $XFS_PROJID_CF
> > > > > xfs_projid=$(wc -l < $XFS_PROJID_CF)
> > > >
> > > > The xfs_projid is then the number of lines in the file? Why not
> using $JOB_ID directly? Is there a limit in max. project ID and the $JOB_ID
> might be larger?
> > > >
> > > > -- Reuti
> > > >
> > > >
> > > > > My test shows, when there are multiple jobs start on the same exec
> host at the same time, the prolog script is executed almost the same time,
> results multiple jobs share the same xfs_projid, which is no good.
> > > > >
> > > > > I am wondering if I can configure the scheduler to start the jobs
> in a sequential way (probably has a interval in between).
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Derrick
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > users at gridengine.org
> > > > > https://gridengine.org/mailman/listinfo/users
> > > >
> > > >
> > >
> > >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20180801/981f24ac/attachment.html>


More information about the users mailing list