[gridengine users] Filling up nodes when using gepetools

Winkler, Ursula (ursula.winkler@uni-graz.at) ursula.winkler at uni-graz.at
Thu Jul 30 16:49:23 UTC 2015





Am 30.07.2015 um 18:29 schrieb "Reuti" <reuti at staff.uni-marburg.de>:

> 
>> Am 30.07.2015 um 18:14 schrieb Winkler, Ursula (ursula.winkler at uni-graz.at) <ursula.winkler at uni-graz.at>:
>> 
>> 
>> 
>>> On Thu, 30 Jul 2015 12:57:13 +0000
>>> "Winkler, Ursula (ursula.winkler at uni-graz.at)"
>>> <ursula.winkler at uni-graz.at> wrote:
>>> 
>>>>> My suggestion was to modify your jsv/gepetools to force single node
>>>>> parallel jobs into PEs with $pe_slots allocation rules (which gives
>>>>> you control over where they are scheduled via queue_sort_method and
>>>>> load_formula) while sending the others to PEs with other
>>>>> (appropriate) 
>>>>> allocation rules that won't cause (ii).
>>>> 
>>>> Well, I created an additional PE with alloacation_rule "$pe_slots",
>>>> and built in an if condition in "pe.jsv" for all jobs which request
>>>> just a single node to be assigned to this new PE. But the annoying
>>>> situation didn't change. The scheduler configuration is set to
>>>> "queue_sort_method    load" and "load_formula  slots". So what I'm
>>>> still missing?
>>> Ignore previous message.  Me getting it back to front I think.  That
>>> looks correct (I think).  Have you checked the jobs show the right
>>> granted PE with qstat -j?
>> 
>> Yes, of course.
> 
> Sorry to step in the discussion: `qstat -j ...` shows the requested one, the granted one is in `qstat -r`.
> 
> $ qsub -pe "*" 2 test.sh
> Your job 44329 ("test.sh") has been submitted
> $ qstat -j 44329
> ...
> parallel environment:  * range: 2
> ...
> $ qstat -r
> ...
>       Requested PE:     * 2
>       Granted PE:       make 2
> 
> -- Reuti

At the moment I don't know if I checked it with "qstat -j", but I checked it - when I'm in the office again I probably have the output still on some screen window so I can tell it exactly. And I did do a test: I removed the PE temporarely from the queue - with the result that the jobs could not start anymore (as respected). 



More information about the users mailing list