[gridengine users] schedd dies and error messages

Reuti reuti at staff.uni-marburg.de
Tue May 13 19:19:35 UTC 2014


Am 13.05.2014 um 14:32 schrieb Arnau Bria:

> Hello,
> 
>> Our schedd is dying unexpectedly since a user has sent 750000 jobs in
>> 3 job arrays.
> [...]
>> Am I reaching some internal SGE limit?
> 
> after holding 2 of thos big job arrays the schedd is not dying anymore.
> So, could it be that 500000 jobs is too much even if they are submitted
> as job arrays?

You removed the limit in SGE's configuration "max_aj_instances" then.

You can try to use "-tc" in `qsub` to limit the number of instances which can be executed at a time, or set "max_pending_tasks_per_job" in the scheduler to a lower limit - it's still set to 50?


> Anyone has experience running big job arrays?

No.

-- Reuti


> 
> TIA,
> Arnau
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users





More information about the users mailing list