[gridengine users] TMPDIR naming (was: Fwd: preventing mix between 2 hosts group)

Dave Love d.love at liverpool.ac.uk
Fri Nov 2 15:32:37 UTC 2012

Reuti <reuti at staff.uni-marburg.de> writes:

> Note: there is also issue https://arc.liv.ac.uk/trac/SGE/ticket/813 where two `qrsh -inherit` to the same exechost end up in wrong queues. This would also be solved then, as the desired queue can't be selected right now.

Looking at the code, I don't actually understand how you get
inconsistent TMPDIRs, as the name seems to be derived from the master
queue name in the calls of sge_make_tmpdir.

> (Only if you would like to get exactly one unique $TMPDIR per `qrsh -inherit` with a slot count of 1 in each queue you would be out of luck. But for now this can't be guaranteed anyway. OTOH: it could be a feature to limit some kind of disk quota inside $TMPDIR and you want to get a correct one for each `qrsh -inherit` call and the -q option should be implemented.)

Maybe, though that seems quite obscure and less important than problems
caused by the current implementation, even if I'm now confused how they

> Before changing this: I wonder what was the intention >12 years ago to
> include the name of the queue, as the job/task-id is already unique?

Yes, that's what I mean.  I'm inclined to change it anyway if there's no
obvious reason.  (The id is only unique in a given cell, and you could
currently have trouble from multiple cells with job ids of similar
sizes, though I doubt that's at all common.)

> I'm not sure, whether it was already in DQS. In SGE 5.3 there were no
> cluster queues (i.e. one queue definition per exechost...) and often
> the number of the exechost was included in the name of the queue
> because of this, like 1234.1.serial01.q for a serial queue on node01.

I'm not sure it helps, but dqs_make_tmpdir:

  /* Note could have multiple instantiations of same job, */
  /* on same machine, under same queue */

c.f. sge_make_tmpdir:

   /* Note could have multiple instantiations of same job, */
   /* on same machine, under same queue */
   snprintf(tmpdir, ltmpdir, "%s/"sge_u32"."sge_u32".%s", t, jobid,
            jataskid, lGetString(qep, QU_qname));

