[gridengine users] Have qhost -xml slot reporting bug?

Reuti reuti at Staff.Uni-Marburg.DE
Wed May 22 14:35:53 UTC 2013


Am 22.05.2013 um 00:55 schrieb Orion Poplawski:

> On 05/16/2013 02:07 AM, Reuti wrote:
>> Am 16.05.2013 um 01:00 schrieb Orion Poplawski:
>> 
>>> On 05/15/2013 04:20 PM, Dave Love wrote:
>>>> Orion Poplawski <orion at cora.nwra.com> writes:
>>>> 
>>>>> On 05/13/2013 09:40 AM, Orion Poplawski wrote:
>>>>>> Would it be possible for qhost -xml output to include the number of slots used
>>>>>> by a job on that host?
>>>>>> 
>>>>> 
>>>>> Okay, I see how it is done - there are multiple job entries for each
>>>>> slot.
>>>> 
>>>> Indeed, like the non-XML version.  The SGE support
>>>> in<https://oss.trac.surfsara.nl/jobmonarch> should provide an example of
>>>> similar processing from qstat output.
>>>> 
>>>>> Job 31911 is a 4 slot pe job - why are there 5 entries (one of which
>>>>> is a "MASTER")?
>>>> 
>>>> Because the PE has job_is_first_task false?
>>>> 
>>> 
>>> It is, but something doesn't jive:
>>> 
>>>   job_is_first_task
>>>       The job_is_first_task parameter can be set to TRUE or FALSE. A value of TRUE indicates
>>>       that the Sun Grid Engine job script already contains one of the tasks of the  parallel
>>>       application (the number of slots reserved for the job is the number of slots requested
>>>       with the -pe switch), while a value of FALSE indicates that the job script  (and  its
>>>       child processes) is not part of the parallel program (the number of slots reserved for
>>>       the job is the number of slots requested with the -pe switch + 1).
>>> 
>>> My -pe mpi 4 job:
>>> 
>>> qstat:
>>> job-ID  prior   name       user         state submit/start at     queue                    slots ja-task-ID
>>> -----------------------------------------------------------------------------------------------------------------
>>>  31923 0.60188 osu_bw-int orion        r     05/15/2013 16:56:58 mpi at andrew.cora.nwra.com           4
>> 
>> With `qstat -g t` you get the similar output to the one below. Whether it shows 3 or 4 slaves depends as noted on the setting of job_is_first_task and tries to reflect what's granted to the job. That the master process is supposed to be idling (according to your setup) you can't get from the plain output of course.
>> 
>> -- Reuti
> 
> That isn't getting to my question though.  The man page says "the number of slots reserved for the job is the number of slots requested with the -pe switch + 1".  But I don't see any evidence of that, unless I don't understand what "slots reserved" means.  If I submit a -pe 8 job, it will run on a host with 8 slots.  As near as I can tell, the parenthetical remarks should just be removed.

Aha, I see. Yes indeed, the wording is confusing. Maybe it would be better to say:

job_is_first_task
The job_is_first_task...
...
The output of `qstat -g t`resp. `qhost -q` will describe the allocation for a job running in this particular PE according to this setting. I.e. for job_is_first_task set to FALSE it will list one slot more to reflect the idling main job script for this job. It will also allow one additional `qrsh -inerherit ...` to be made in this case to spawn the necessary number of slave tasks. Nevertheless the overall consumed slot count is still the same independent of this setting.

-- Reuti


> 
> -- 
> Orion Poplawski
> Technical Manager                     303-415-9701 x222
> NWRA, Boulder/CoRA Office             FAX: 303-415-9702
> 3380 Mitchell Lane                       orion at nwra.com
> Boulder, CO 80301                   http://www.nwra.com





More information about the users mailing list