[gridengine users] Start jobs on exec host in sequential order

Reuti reuti at staff.uni-marburg.de
Sat Jul 28 01:53:47 UTC 2018


> Am 28.07.2018 um 03:00 schrieb Derrick Lin <klin938 at gmail.com>:
> 
> Thanks Reuti,
> 
> I know little about group ID created by SGE, and also pretty much confused with the Linux group ID.

Yes, SGE assigns a conventional group ID to each job to track the CPU and memory consumption. This group ID is in the range you defined in:

$ qconf -sconf
…
gid_range                    20000-20100

and this will be unique per node. First approach could be either `sed`:

$ id
uid=25000(reuti) gid=25000(ourgroup) groups=25000(ourgroup),10(wheel),1000(operator),20052,24000(common),26000(anothergroup)
$ id | sed -e "s/.*),\([0-9]*\),.*/\1/"
20052

or:

ADD_GRP_ID=$(< $SGE_JOB_SPOOL_DIR/addgrpid)
echo $ADD_GRP_ID

-- Reuti


> I assume that "ïd" is called inside the prolog script, typically what the output looks like?
> 
> Cheers,
> 
> On Fri, Jul 27, 2018 at 4:12 PM, Reuti <reuti at staff.uni-marburg.de> wrote:
> 
> Am 27.07.2018 um 03:14 schrieb Derrick Lin:
> 
> > We are using $JOB_ID as xfs_projid at the moment, but this approach introduces problem to array jobs whose tasks have the same $JOB_ID (with different $TASK_ID).
> > 
> > Also it is possible that tasks from two different array jobs run on the same node contain the same $TASK_ID, thus the uniqueness of the $TASK_ID on the same host cannot be maintained.
> 
> So the number you are looking for needs to be unique per node only?
> 
> What about using the additional group ID then which SGE creates – this will be unique per node.
> 
> This can be found in the `id` command's output or in location of the spool directory for the execd_spool_dir in ${HOSTNAME}/active_jobs/${JOB_ID}.${TASK_ID}/addgrpid
> 
> -- Reuti
> 
> 
> > That's why I am trying to implement the xfs_projid to be independent from SGE.
> > 
> > 
> > 
> > On Thu, Jul 26, 2018 at 9:27 PM, Reuti <reuti at staff.uni-marburg.de> wrote:
> > Hi,
> > 
> > > Am 26.07.2018 um 06:01 schrieb Derrick Lin <klin938 at gmail.com>:
> > > 
> > > Hi all,
> > > 
> > > I am working on a prolog script which setup xfs quota on disk space per job basis.
> > > 
> > > For setting up xfs quota in sub directory, I need to provide project ID.
> > > 
> > > Here is how I did for generating project ID:
> > > 
> > > XFS_PROJID_CF="/tmp/xfs_projid_counter"
> > > 
> > > echo $JOB_ID >> $XFS_PROJID_CF
> > > xfs_projid=$(wc -l < $XFS_PROJID_CF)
> > 
> > The xfs_projid is then the number of lines in the file? Why not using $JOB_ID directly? Is there a limit in max. project ID and the $JOB_ID might be larger?
> > 
> > -- Reuti
> > 
> > 
> > > My test shows, when there are multiple jobs start on the same exec host at the same time, the prolog script is executed almost the same time, results multiple jobs share the same xfs_projid, which is no good.
> > > 
> > > I am wondering if I can configure the scheduler to start the jobs in a sequential way (probably has a interval in between).
> > > 
> > > 
> > > Cheers,
> > > Derrick
> > > _______________________________________________
> > > users mailing list
> > > users at gridengine.org
> > > https://gridengine.org/mailman/listinfo/users
> > 
> > 
> 
> 





More information about the users mailing list