[gridengine users] Throttling job starts (thundering herd)

JuanEsteban.Jimenez at mdc-berlin.de JuanEsteban.Jimenez at mdc-berlin.de
Thu Mar 23 13:04:23 UTC 2017


Known bug on GPFS, I have an open ticket with them through DDN for this. I can reproduce at will.

Workarounds are:

1) Space out the job submissions with sleep()

2) Put the temp log directory outside GPFS.

Regards,
Juan

On Thu, 16 Feb 2017 at 13:43 -0000, Stuart Barkley wrote:

> Is there a way to throttle job starts on Grid Engine (we are using Son
> of Grid Engine)?
>
> i.e. I would like to limit the number of tasks started during each
> scheduling cycle and spread the startup of large array jobs over a
> longer (still short) period of time.  I'm aware that this would be a
> tradeoff against task throughput for very short tasks.
>
> We appear to be having some filesystem (GPFS) problems when 2000+
> tasks on 350+ nodes all start creating grid engine log files in the
> same directory at the same time.  These tasks are often for a single
> user hitting an idle system so I can't use maxujobs.
>
> Ideally we fix the filesystem and/or network communications.  I'm
> looking for a workaround.
>
> These jobs tend to have the same runtime so I'm seeing periodic floods
> of simultaneous file creation.  I can get the user to add some random
> sleep time in the jobs to spread later jobs out, but the idle->full
> spike will still exist.
>
> Thanks,
> Stuart
> --
> I've never been lost; I was once bewildered for three days, but never lost!
>                                         --  Daniel Boone

Mfg,
Juan Jimenez
System Administrator, HPC
MDC Berlin / IT-Dept.
Tel.: +49 30 9406 2800

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20170323/a1fadfc7/attachment.html>


More information about the users mailing list