[gridengine users] Two clusters, one gridengine to rule them all?

Johan Finstadsveen jfinstadsveen at gmail.com
Fri Nov 4 09:10:28 UTC 2011


Thanks for a quick reply.

You have summarized correctly. So you would recommend a setup where the
gpu-frontend-node would manage all machines, cpu and gpu? That is good to
know, but requires some rethinking of our current setup.

The desire from the users is to be able to send different workloads to
different queues, depending on the type of task, ie some tasks are more
ideal for cpu, others gpu. Additionally they wish to utilize the cpus on
the gpu-nodes to maximize the overall utilization. I am not sure whether
this is possible, or if it is an overall different debate.

The desire from me as a sysadmin is to have the all-function used as a
method of removing machines temporarily to perform tests or upgrades
without users adding more jobs.

Johan


On Fri, Nov 4, 2011 at 9:54 AM, William Hay <w.hay at ucl.ac.uk> wrote:

> On 4 November 2011 07:24, Johan Finstadsveen <jfinstadsveen at gmail.com>
> wrote:
> > Hi,
> > Unsure whether this is the correct forum for this debate.
> >
> > We are currently in the process of acquiring a gpu-cluster. From before
> we
> > have a cpu-based cluster running Rocks 5.3 and SGE. The desire from the
> > users is to have three different queues from a single frontend (types of
> > queues: all, cpu, gpu).
> > What is the optimal/best practice setup in this case? Should one frontend
> > administrate all machines (old and new cluster). Or should a dedicated
> > server use SGE to send queues to the two cluster-frontends (ie, have
> three
> > SGE). Or are there other setups or solutions that are more optimal?
> > Best regards
> > Johan Finstadsveen
> If I understand correctly you are debating whether to use a single
> cluster or a front end
> cluster that uses transfer queues to feed two backend clusters?
> Presumably the all queue
> would be for CPU  jobs that can run on the machines to which the GPUs are
> attached as well as the dedicated CPU resource.  I'd suggest a single
> cluster as I believe that
> with the transfer queue setup it would be hard to avoid making an
> early commitment as to which cluster
> a given job should run on which could lead unnecessarily wasted resources.
>
> What if anything would be the advantage to the user of selecting the
> cpu queue?  Are the machines in
> the cpu cluster better in some way?
>
> William
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20111104/2e9c3eda/attachment.html>


More information about the users mailing list