[gridengine users] SGE 8.1.3 and cpusets/cgroups

Mikael Brandström Durling mikael.durling at slu.se
Tue Mar 12 11:24:24 UTC 2013


12 mar 2013 kl. 12:12 skrev Reuti <reuti at staff.uni-marburg.de>
:

> Hi,
> 
> Am 12.03.2013 um 10:41 schrieb Mikael Brandström Durling:
> 
>> I noticed in the man page for sge_conf that there is an experimental option for enabling cpusets in age.  I tried to search the mail list archives and google for documentation on how to enable it, which led to the util/resources/scripts/setup-cgroups-etc. Calling it from the sgeexecd init script and enabling cgroups in qconf -mconf yields that jobs are launched  into cpusets. However, I have a questions regarding this. I suppose this feature can be used to replace the core binding, and ensuring that a task, serial or within a PE is only allowed to use the cpus assigned to it?
> 
> IMO these are two different things: getting a cgroups allocation will limit the job to use only the assigned cores. But inside the assigned cores there is no binding of a process to a particular core of a parallel job. Inside the assigned cores it's the duty of the kernel to place the processes on the most suited core for a parallel job in each time slice.

I see.  For our needs it would then suffice if we can assure that a parallel job is confined to the number of cores that are allotted to the PE (even when given a PE with a range). Is this what's happening now with the current implementation of cgroups? The problem we would like to solve is when users submit a job with software that defaults to start a number of worker threads equal to the number of cores, and thus parasitising on other jobs allocations within the node. 

> 
> 
>> However, I can't figure how this is done. Has anyone documented it?
>> 
>> Otherwise I'll have to resort to the traditional core binding in sge, but then I need to set up a jsv to set the binding properly for parallel tasks. Is there some jsv snippet available to do this? I frequently se references to this approach on the list, but I have not found a working example.
> 
> Using complete nodes and then binding in Open MPI or other parallel libraries is also an option. Especially as all processes are then bound to a specific dedicated core for each of them. How many cores do you have in the machines and how many cores are usually used by a parallel job?

We have 8 or 16 cores per machine. However, as the workloads is a mixture between serial and parallel jobs, it is quite common that parallel jobs get stuck in the queue as there is no complete node free. To get around this problem we are quite often spreading MPI jobs across nodes to pick up free slots. The softwares we are using are not passing a lot of information across MPI, but mostly uses it for synchronisation.

Mikael

> 
> -- Reuti
> 
> 
>> Thanks in advance,
>> Mikael
>> 
>> 
>> _______________________________________________
>> users mailing list
>> users at gridengine.org
>> https://gridengine.org/mailman/listinfo/users
> 





More information about the users mailing list