[gridengine users] Start jobs on exec host in sequential order

Derrick Lin klin938 at gmail.com
Mon Jul 30 00:31:10 UTC 2018


Hi Reuti,

The approach sounds great.

But the prolog script seems to be run by root, so this is what I got:

XFS_PROJID:uid=0(root) gid=0(root) groups=0(root),396(sfcb)

Maybe I am still missing something or prolog script is the wrong place for
getting the group ID generated by SGE?

Cheers,
D

On Sat, Jul 28, 2018 at 11:53 AM, Reuti <reuti at staff.uni-marburg.de> wrote:

>
> > Am 28.07.2018 um 03:00 schrieb Derrick Lin <klin938 at gmail.com>:
> >
> > Thanks Reuti,
> >
> > I know little about group ID created by SGE, and also pretty much
> confused with the Linux group ID.
>
> Yes, SGE assigns a conventional group ID to each job to track the CPU and
> memory consumption. This group ID is in the range you defined in:
>
> $ qconf -sconf
>> gid_range                    20000-20100
>
> and this will be unique per node. First approach could be either `sed`:
>
> $ id
> uid=25000(reuti) gid=25000(ourgroup) groups=25000(ourgroup),10(
> wheel),1000(operator),20052,24000(common),26000(anothergroup)
> $ id | sed -e "s/.*),\([0-9]*\),.*/\1/"
> 20052
>
> or:
>
> ADD_GRP_ID=$(< $SGE_JOB_SPOOL_DIR/addgrpid)
> echo $ADD_GRP_ID
>
> -- Reuti
>
>
> > I assume that "ïd" is called inside the prolog script, typically what
> the output looks like?
> >
> > Cheers,
> >
> > On Fri, Jul 27, 2018 at 4:12 PM, Reuti <reuti at staff.uni-marburg.de>
> wrote:
> >
> > Am 27.07.2018 um 03:14 schrieb Derrick Lin:
> >
> > > We are using $JOB_ID as xfs_projid at the moment, but this approach
> introduces problem to array jobs whose tasks have the same $JOB_ID (with
> different $TASK_ID).
> > >
> > > Also it is possible that tasks from two different array jobs run on
> the same node contain the same $TASK_ID, thus the uniqueness of the
> $TASK_ID on the same host cannot be maintained.
> >
> > So the number you are looking for needs to be unique per node only?
> >
> > What about using the additional group ID then which SGE creates – this
> will be unique per node.
> >
> > This can be found in the `id` command's output or in location of the
> spool directory for the execd_spool_dir in ${HOSTNAME}/active_jobs/${JOB_
> ID}.${TASK_ID}/addgrpid
> >
> > -- Reuti
> >
> >
> > > That's why I am trying to implement the xfs_projid to be independent
> from SGE.
> > >
> > >
> > >
> > > On Thu, Jul 26, 2018 at 9:27 PM, Reuti <reuti at staff.uni-marburg.de>
> wrote:
> > > Hi,
> > >
> > > > Am 26.07.2018 um 06:01 schrieb Derrick Lin <klin938 at gmail.com>:
> > > >
> > > > Hi all,
> > > >
> > > > I am working on a prolog script which setup xfs quota on disk space
> per job basis.
> > > >
> > > > For setting up xfs quota in sub directory, I need to provide project
> ID.
> > > >
> > > > Here is how I did for generating project ID:
> > > >
> > > > XFS_PROJID_CF="/tmp/xfs_projid_counter"
> > > >
> > > > echo $JOB_ID >> $XFS_PROJID_CF
> > > > xfs_projid=$(wc -l < $XFS_PROJID_CF)
> > >
> > > The xfs_projid is then the number of lines in the file? Why not using
> $JOB_ID directly? Is there a limit in max. project ID and the $JOB_ID might
> be larger?
> > >
> > > -- Reuti
> > >
> > >
> > > > My test shows, when there are multiple jobs start on the same exec
> host at the same time, the prolog script is executed almost the same time,
> results multiple jobs share the same xfs_projid, which is no good.
> > > >
> > > > I am wondering if I can configure the scheduler to start the jobs in
> a sequential way (probably has a interval in between).
> > > >
> > > >
> > > > Cheers,
> > > > Derrick
> > > > _______________________________________________
> > > > users mailing list
> > > > users at gridengine.org
> > > > https://gridengine.org/mailman/listinfo/users
> > >
> > >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20180730/ac0c169e/attachment.html>


More information about the users mailing list