[gridengine users] Fwd: dispatching sge task from an sge task - is that a reasonable practice?

Mikhail Serkov serkov.m at gmail.com
Thu Feb 25 22:11:43 UTC 2016


It's definitely not a good practice to do qsub from within a job as it might cause a deadlocks and also very hard to support in case of error.

I would say you should change the workflow and submit jobs in different way, using hold_jid parameter.

It should look like this:
submit all 'days' jobs -> submit all 'month' jobs with 'day' jobs as a dependents -> submit 'year' job with 'month' jobs dependents 

In this case you will be able to submit all of them to the same queue simultaneously and don't worry about the deadlock or different queues.

> On Feb 25, 2016, at 4:16 PM, Ben Daniel Pere <ben.pere at gmail.com> wrote:
> 
> Where I work, we have jobs that submit jobs that submit jobs.. this could potentially cause a deadlock but we're somehow (probably luck) manage to live with it.. I'm wondering if that's a reasonable practice and if not if you can suggest a better way to do what we do..
> 
> Example:
> 
> we have these 3 tasks:
> 
> - "analyze.day" job analyzed a day of data and returns some output
> - "analyze.month" job sends "analyze.day" jobs for a whole month and outputs summary
> - "analyze.year" job sends  "analyze.month" jobs for a whole year and outputs summary
> 
> usually people run analyze.day everyday on previous day but sometimes they test their new algorithm on a whole year so they dispatch analyze.year which dispatched analyze.month which dispatched analyze.day.. 
> We created a "dispatching" queue which is the only queue we allow submitting jobs from but since both analyze.year and analyze.month need to run there (both dispatch tasks) we could end up with a dead lock (theoretically, lots of analyze.year running together taking all dispatching queue slots and not leaving room for analyze.month tasks which they will forever wait for), also besides dispatching they also do some logic so it's a strange animal, this "dispatching" queue..
> 
> What's the "correct" practice here?
> 
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users




More information about the users mailing list