[gridengine users] limit CPU/slot resource to the number of reserved slots

Skylar Thompson skylar2 at uw.edu
Thu Aug 29 15:37:49 UTC 2019


We actually run CentOS 6 as well, and haven't seen this problem, though
maybe our users haven't done anything as untoward as yours. We do have a
bunch of bioinformatics code (including Java) so I thought we would have
seen the worst cases.

On Thu, Aug 29, 2019 at 10:50:27AM -0400, Mike Serkov wrote:
> Load average indeed. The thing is that if, we have a parallel process bound to one core, the kernel scheduler has to constantly switch those threads from running to sleeping state and back and do context switch which creates some overhead on the system itself. Imagine you have 64CPU box, each core runs such a job, and every job spawns 64 threads ( which is a usual case, as many tools just do system call to identify amount of cpus they can use by default ). In both cases with affinity forced and without - it is not a good situation. In case of affinity is enforced, in extreme cases we had nodes just frozen, especially in cases when heavy I/O also a case ( probably because of overhead on the kernel scheduler ). It was on RHEL6, maybe on modern kernels it is much better. All I want to say is that unlike memory limitations with cgroups, when you are actually sure that process can???t allocate more, with cpusets it is a bit different. Users still can run as many parallel processes as they want. They are limited to a number of physical CPUS, but still it may affect the node and other jobs. 
> 
> Best regards,
> Mikhail Serkov 
> 
> > On Aug 29, 2019, at 10:20 AM, Skylar Thompson <skylar2 at uw.edu> wrote:
> > 
> > Load average gets high if the job spawns more processes/threads than
> > allocated CPUs, but we haven't seen any problem with node instability. We
> > did have to remove np_load_avg from load_thresholds, though, to keep our
> > users from DoS'ing the cluster...
> > 
> >> On Thu, Aug 29, 2019 at 05:27:36AM -0400, Mike Serkov wrote:
> >> Also, something to keep in mind - cgroups will not solve this issue completely. It is just affinity enforcement. If the job spawns multiple threads and they all active - it will cause LA growing as well as some other side effects, regardless affinity setting. On big SMP boxes it may actually cause more instability. Anyway, jobs should be configured to use exact amount of threads they request, and it should be monitored.
> >> 
> >> Best regards,
> >> Mikhail Serkov 
> >> 
> >>> On Aug 29, 2019, at 4:16 AM, Ondrej Valousek <ondrej.valousek at adestotech.com> wrote:
> >>> 
> >>> Also a quick note: cgroups is the way to _enforce_ CPU affinity.
> >>> For vast majority of the jobs, I would say just a simple taskset configuration (i.e. i.e. something like ???-l binding linear???) would do as well.
> >>> 
> >>> 
> >>> From: Dietmar Rieder <dietmar.rieder at i-med.ac.at> 
> >>> Sent: Thursday, August 29, 2019 9:37 AM
> >>> To: users at gridengine.org; Ondrej Valousek <ondrej.valousek at adestotech.com>; users <users at gridengine.org>
> >>> Subject: Re: [gridengine users] limit CPU/slot resource to the number of reserved slots
> >>> 
> >>> Great, thanks so much!
> >>> 
> >>> Dietmar
> >>> 
> >>> Am 29. August 2019 09:05:35 MESZ schrieb Ondrej Valousek <ondrej.valousek at adestotech.com>:
> >>> Nope,
> >>> SoGE (as of 8.1.9) supports CGROUPS w/o any code changes, just add ???USE_CGROUPS=yes??? to the exec parameter list to make shepherd use CGroup saveset controller.
> >>> My path only extends it to supports system and hence possibility to hard enforce memory/cpu limits, etc???
> >>> Hth,
> >>> Ondrej
> >>> 
> >>> From: Daniel Povey <dpovey at gmail.com> 
> >>> Sent: Monday, August 26, 2019 10:12 PM
> >>> To: Dietmar Rieder <dietmar.rieder at i-med.ac.at>; Ondrej Valousek <ondrej.valousek at adestotech.com>; users <users at gridengine.org>
> >>> Subject: Re: [gridengine users] limit CPU/slot resource to the number of reserved slots
> >>> 
> >>> I don't think it's supported in Son of GridEngine.  Ondrej Valousek (cc'd) described in the first thread here
> >>> http://arc.liv.ac.uk/pipermail/sge-discuss/2019-August/thread.html
> >>> how he was able to implement it, but it required code changes, i.e. you would need to figure out how to build and install SGE from source, which is a task in itself.
> >>> 
> >>> Dan
> >>> 
> >>> 
> >>> On Mon, Aug 26, 2019 at 12:46 PM Dietmar Rieder <dietmar.rieder at i-med.ac.at> wrote:
> >>> Hi,
> >>> 
> >>> thanks for your reply. This sounds promising.
> >>> We are using Son of Grid Engine though. Can you point me to the right
> >>> docs to get cgroup enabled in the exec host (CentOS 7). I must admit I
> >>> have no experience with cgroups.
> >>> 
> >>> Thanks again
> >>>  Dietmar
> >>> 
> >>>> On 8/26/19 4:03 PM, Skylar Thompson wrote:
> >>>> At least for UGE, you will want to use the CPU set integration, which will
> >>>> assign the job to a cgroup that has one CPU per requested slot. Once you
> >>>> have cgroups enabled in the exec host OS, you can then set these options in
> >>>> sge_conf:
> >>>> 
> >>>> cgroup_path=/cgroup
> >>>> cpuset=1
> >>>> 
> >>>> You can use this mechanism to have the m_mem_free request enforced as well.
> >>>> 
> >>>>> On Mon, Aug 26, 2019 at 02:15:22PM +0200, Dietmar Rieder wrote:
> >>>>> Hi,
> >>>>> 
> >>>>> may be this is a stupid question, but I'd like to limit the used/usable
> >>>>> number of cores to the number of slots that were reserved for a job.
> >>>>> 
> >>>>> We often see that people reserve 1 slot, e.g. "qsub -pe smp 1 [...]"
> >>>>> but their program is then running in parallel on multiple cores. How can
> >>>>> this be prevented? Is it possible that with reserving only one slot a
> >>>>> process can not utilize more than this?
> >>>>> 
> >>>>> I was told the this should be possible in slurm (which we don't have,
> >>>>> and to which we don't want to switch to currently).
> >>>>> 
> >>>>> Thanks
> >>>>>  Dietmar
> >>>> 
> >>> 
> >>> 
> >>> -- 
> >>> _________________________________________
> >>> D i e t m a r  R i e d e r, Mag.Dr.
> >>> Innsbruck Medical University
> >>> Biocenter - Institute of Bioinformatics
> >>> Email: dietmar.rieder at i-med.ac.at
> >>> Web:   http://www.icbi.at
> >>> 
> >>> 
> >>> _______________________________________________
> >>> users mailing list
> >>> users at gridengine.org
> >>> https://gridengine.org/mailman/listinfo/users
> >>> 
> >>> --
> >>> D i e t m a r R i e d e r, Mag.Dr.
> >>> Innsbruck Medical University
> >>> Biocenter - Institute of Bioinformatics
> >>> Innrain 80, 6020 Innsbruck
> >>> Phone: +43 512 9003 71402
> >>> Fax: +43 512 9003 73100
> >>> Email: dietmar.rieder at i-med.ac.at
> >>> Web: http://www.icbi.at
> >>> _______________________________________________
> >>> users mailing list
> >>> users at gridengine.org
> >>> https://gridengine.org/mailman/listinfo/users
> > 
> >> _______________________________________________
> >> users mailing list
> >> users at gridengine.org
> >> https://gridengine.org/mailman/listinfo/users
> > 
> > 
> > -- 
> > -- Skylar Thompson (skylar2 at u.washington.edu)
> > -- Genome Sciences Department, System Administrator
> > -- Foege Building S046, (206)-685-7354
> > -- University of Washington School of Medicine
> > _______________________________________________
> > users mailing list
> > users at gridengine.org
> > https://gridengine.org/mailman/listinfo/users

-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine



More information about the users mailing list