[gridengine users] Transitioning from Torque/Maui to Open Grid Scheduler
Joseph A. Farran
jfarran at uci.edu
Sat Apr 21 18:53:16 UTC 2012
Hi Rayson & Ron.
Thank you both for responding.
We do a lot of parallel runs with our cluster. Here is more info on what we currently have and I will keep this example down to 3 queues and 6 nodes for simplicity.
With our current Torque setup, I have 6 64-core nodes. 3 nodes belong to the math group, 3 nodes to the bio group. We setup our Queues as: 1 Queue being Preemptee, 2 being Preemptors.
When I create an account, the account is setup to belong to the 'math' group, or to the 'bio' group.
Our current nodes and Queues are as follows:
3 nodes have the properties "math", "free" and "64" cores.
3 nodes have the properties "bio", "free" and "64" cores.
The "math" Queue looks for nodes with "math" properties and run jobs only on "math" nodes. Math Q is Preemptor.
The "bio" Queue looks for nodes with "bio" properties and runs jobs only on the "bio" nodes. Bio Q is Preemptor.
The "free" Queue looks for nodes with "free" properties and runs jobs on any node BUT only as a Preemptee job.
The idea here is that the free Q allows everyone to use the "free" nodes as long as the owners (math or bio) are not using them. The free Q is setup as a Preemptee Q, the math & bio Q's are setup as
Preemptor Q's.
When the math users submit a job to the math Q, any free job running on the math nodes get suspended.
When the bio users submit a job to the bio Q, any free job on the bio nodes also get suspended.
Suspended jobs automatically resume when the node owners are done using their nodes (no jobs on node).
With Torque, math users can request from 1 to 3 math nodes and from 1-64 cores on each node. For example, a math user can request 2 math nodes at 32 cores each in interactive mode with:
qsub -I -q math nodes=2:ppn=32
If the user does not belong to the 'math' group, they are prevented from running on the math Q. Same for the bio users.
I will stop here as I have more requirements, but this is the main set of functions I am looking for in OGE.
Thank you again for your generous efforts in helping.
Joseph
On 4/20/2012 9:01 PM, Rayson Ho wrote:
> Hi Joseph,
>
> "Queues" in Grid Engine (and Open Grid Scheduler/Grid Engine) and the
> ones Torque/Maui have slightly different meaning.
>
> In Grid Engine, jobs are not submitted to "queues", but rather jobs
> are submitted to the global waiting area. Then the scheduler picks
> "queue instances" (queue instances roughly = hosts, yet each host can
> have more than 1 queue instance) that satisfy the resource
> requirements of each job, and at that point they are binded to the
> queues.
>
> We also have global queues called "cluster queues", but they are
> abstraction of the queue instances.
>
> So what does that all mean??
>
> In LSF or Torque, some clusters have debug queues, short queues, long
> queues, etc. Those can be migrated to Grid Engine cluster queues with
> some work (ie. relatively easy).
>
> If you want queue level user-based fairshare or queue-based fairshare
> in LSF (eg. users in each queue gets a different priority) - I have
> not looked at Maui for a while, not sure if it has this feature, then
> it can be harder to implement or model in Grid Engine.
>
> If you let us know a bit more about your setup, then we can provide
> further help.
>
> Rayson
>
>
>
> On Fri, Apr 20, 2012 at 11:42 PM, Joseph A. Farran<jfarran at uci.edu> wrote:
>> Hi All.
>>
>> I am a long time Torque/Maui admin running an HPC cluster looking to
>> Transition to Open Grid Engine. I am a newbie with OGE however.
>>
>> Are there any links and or helpful tips on moving to OGE from an admin point
>> of view? How to convert Torque qmgr queues, nodes, resource limits to the
>> equivalent in OGE?
>>
>> Thanks,
>> Joseph
>>
>>
>> _______________________________________________
>> users mailing list
>> users at gridengine.org
>> https://gridengine.org/mailman/listinfo/users
>>
>
More information about the users
mailing list