[gridengine users] SoGE 8.1.8 - Qsub issue when using request able variable and parallel environment - need your help.

Yuri Burmachenko yuribu at mellanox.com
Fri Oct 30 09:40:39 UTC 2015


Hallo to distinguished forum members,

Recently we have a need to submit jobs in way that qsub request both requestable variable hostname and parallel environment.

For example if we submit 'xterm' job:

*         $SGE_ROOT/bin/lx-amd64/qsub -V -cwd -b y -l hostname=host_in_grid -pe somePe 1 xterm

This kind of request results in a strange behavior of the scheduler - this requests results to one of the below states of the submission:


1.      xterm job opened as expected.

2.      There is a very long delay and then xterm opened.

3.      Job enters 'qw' state with similar to below error:

cannot run because it exceeds limit "/////" in rule "some_rule/1"

cannot run in PE "somePe" because it only offers 0 slots

In all of the above states the "host_in_grid" has enough free slots and the quota rule "some_rule" is not related in any way to the consumable/request able variable in the job submission request.
If we try to remove "some_rule" quota from the SGE quotas, then this error picks up another rule and again states that its limit was exceeded.
NOTE: somePe parallel environment has enough free slots - it is initially defined with 999 slots.

Basically these "cannot run" messages do not reflect the real reason why the job can't be run, since all conditions are actually met - this is very confusing, why this happen?

We also found a workaround without the requestable variable "hostname" like below when it ALWAYS work:
$SGE_ROOT/bin/lx-amd64/qsub -V -cwd -b y -q host_in_grid -pe testpe 1 xterm

Any ideas why does this strange behavior occur? Is this some kind of a bug? How this can be resolved?

Appreciate your help.
Thanks.


Yuri Burmachenko | Sr. Engineer | IT | Mellanox Tech
nologies Ltd.
Work: +972 74 7236386 | Cell +972 54 7542188 |Fax: +972 4 959 3245
Follow us on Twitter<http://twitter.com/mellanoxtech> and Facebook<http://www.facebook.com/pages/Mellanox-Technologies/223164879116>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20151030/14d23474/attachment.html>


More information about the users mailing list