[gridengine users] batch array jobs are executed on interactive queue

Kandalaft, Iyad (AAFC/AAC) iyad.kandalaft at canada.ca
Wed Jan 9 13:25:12 UTC 2019


Thanks for the suggestion Reuti.  Since we don't implement consumable resources for cores/CPUs, we rely on slots to limit the number of jobs per node.  Can you recommend an accepted method or best practice where we can have symmetric multi-processing on both batch and interactive jobs while limiting the interactive queue to strictly interactive jobs?  Based on your suggestion, i can create a second PE named smp_interactive but I am not sure if that is the optimal configuration to achieve the end result.  This change would also impact the users, which I am trying to avoid if possible.

Cheers,
Iyad

________________________________________
From: Reuti <reuti at staff.uni-marburg.de>
Sent: January 8, 2019 4:30 PM
To: Kandalaft, Iyad (AAFC/AAC)
Cc: users at gridengine.org
Subject: Re: [gridengine users] batch array jobs are executed on interactive queue

Hi,

> Am 08.01.2019 um 20:54 schrieb Kandalaft, Iyad (AAFC/AAC) <iyad.kandalaft at canada.ca>:
>
> Hi all,
>
> A problem popped up on a Rocks 7 HPC deployment where batch array jobs are being executed on our interactive queue (interactive.q) as well as our batch queue (all.q).
> This is odd behaviour since our the configuration for the interactive q is set to “qtype                 INTERACTIVE” and the batch qeueue is “qtype                 BATCH”.  Generally, a qlogin sessions only gets assigned to an interactive.q slots and qsub jobs get assigned to all.q.  Where should I start looking for information on this?

qtype INTERACTIVE

is more the behavior of "immediate". Hence:

`qsub -now y …` will go to the interactive.q

`qlogin -now n …` will go to the batch.q

Also the assigned parallel environment will allow a batch job to run in an interactive.q; maybe you cen remove the PE smp there, unless you want to use it interactively too.

-- Reuti


> $ qconf -sp smp
> pe_name            smp
> slots              999
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    NONE
> stop_proc_args     NONE
> allocation_rule    $pe_slots
> control_slaves     TRUE
> job_is_first_task  TRUE
> urgency_slots      min
> accounting_summary TRUE
> qsort_args         NONE
>
> $ qconf -sq all.q
> qname                 all.q
> hostlist              @mnbat @lnbat
> seq_no                0
> load_thresholds       np_load_avg=1
> suspend_thresholds    NONE
> nsuspend              1
> suspend_interval      00:05:00
> priority              0
> min_cpu_interval      00:05:00
> processors            UNDEFINED
> qtype                 BATCH
> ckpt_list             NONE
> pe_list               make smp mpi orte
> rerun                 TRUE
> slots                 0,[@mnbat=80],[@lnbat=128]
> tmpdir                /scratch
> shell                 /bin/bash
> prolog                NONE
> epilog                NONE
> shell_start_mode      posix_compliant
> starter_method        NONE
> suspend_method        NONE
> resume_method         NONE
> terminate_method      NONE
> notify                00:00:60
> owner_list            NONE
> user_lists            NONE
> xuser_lists           NONE
> subordinate_list      NONE
> complex_values        NONE
> projects              NONE
> xprojects             NONE
> calendar              NONE
> initial_state         default
> s_rt                  INFINITY
> h_rt                  5256800
> s_cpu                 INFINITY
> h_cpu                 INFINITY
> s_fsize               INFINITY
> h_fsize               INFINITY
> s_data                INFINITY
> h_data                INFINITY
> s_stack               INFINITY
> h_stack               INFINITY
> s_core                INFINITY
> h_core                INFINITY
> s_rss                 INFINITY
> h_rss                 INFINITY
> s_vmem                INFINITY
> h_vmem                INFINITY
>
> $ qconf -sq interactive.q
> qname                 interactive.q
> hostlist              @mnint @lnint
> seq_no                0
> load_thresholds       np_load_avg=1
> suspend_thresholds    NONE
> nsuspend              1
> suspend_interval      00:05:00
> priority              0
> min_cpu_interval      00:05:00
> processors            UNDEFINED
> qtype                 INTERACTIVE
> ckpt_list             NONE
> pe_list               make smp
> rerun                 FALSE
> slots                 0,[@mnint=80],[@lnint=128]
> tmpdir                /scratch
> shell                 /bin/bash
> prolog                NONE
> epilog                NONE
> shell_start_mode      posix_compliant
> starter_method        NONE
> suspend_method        NONE
> resume_method         NONE
> terminate_method      NONE
> notify                00:00:60
> owner_list            NONE
> user_lists            NONE
> xuser_lists           NONE
> subordinate_list      NONE
> complex_values        NONE
> projects              NONE
> xprojects             NONE
> calendar              NONE
> initial_state         default
> s_rt                  INFINITY
> h_rt                  604800
> s_cpu                 INFINITY
> h_cpu                 INFINITY
> s_fsize               INFINITY
> h_fsize               INFINITY
> s_data                INFINITY
> h_data                INFINITY
> s_stack               INFINITY
> h_stack               INFINITY
> s_core                INFINITY
> h_core                INFINITY
> s_rss                 INFINITY
> h_rss                 INFINITY
> s_vmem                INFINITY
> h_vmem                INFINITY
>
> Thank you for your assistance,
>
> Iyad K
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users




More information about the users mailing list