[gridengine users] Alternatives to Son of GridEngine
Dr. Mark Asbach
mark.asbach at pixolus.de
Mon Nov 12 18:34:26 UTC 2018
Hi Daniel, hi everyone,
I can’t add a lot to Tinas summary. Just as a note from someone who recently found himself in a similar situation as yours: We’ve reviewed SGE/UGE, Slurm, Torque, and HTCondor this spring for an HPC project and found that developer and user activity on Slurm seem to leave all other alternatives far behind. Although I’ve used SGE a lot at uni (and liked it), and although others on the project had similarly positive memories to their Ph.D. times with Torque and Condor, we’ve settled on Slurm now (and even considered fitting everything to Mesos, just because it’s hard to find people that ’still’ know HPC job scheduling systems in a time of AWS and Map/Reduce).
So far I can say that, yes, Slurm is a bit less flexible and you’ll have to re-provision the whole cluster each time you change something general. However, this is not of a real concern to us as our deployment is fully scripted and we’re running a moderately large cluster (< 500 CPUs) for a specific, single project (as opposed to running a cluster infrastructure shared for many different individuals).
P.S.: Anyone interested in a permanent position outside of but in touch with academia that’s about python/django, machine learning, clusters and clouds? :-)
Dr. Mark Asbach
Große Brinkgasse 2b, 50672 Köln
https://pixolus.de, Tel +49 221 949992-20
Registergericht: Amtsgericht Köln, HRB 80243
Geschäftsführer: Dr. Mark Asbach, Dr. Stefan Krausz
> Am 12.11.2018 um 19:06 schrieb Tina Friedrich <tina.friedrich at it.ox.ac.uk>:
> Most of my experience is with either SLURM or Grid Engine.
> UGE is pretty much a drop in replacement for Son of GridEngine; it's fairly
> expensive, but it will be least impact. Also, the support is very good (well,
> I found it so anyway).
> Moving to SLURM would be a much bigger issue; how well that will work for you
> is also very dependent on your environment. Personally I find SLURM inflexible
> (and a bit flaky) - can't add or remove anything (nodes, resources) without
> shipping an update to the main config file & restarting the daemons, for
> example. Also, if you make a lot of use of complexes in Grid Engine, SLURM
> doesn't really have them much, at least not to that extend. SLURM is very
> focused on 'proper' HPC in it's design, really; large parallel jobs, static
> environments, rather than bunches of batch jobs.
> I'd add HTCondor to that list. If you're doing more high throughput & resource
> scheduling, I'd probably look into that.
> On Monday, 12 November 2018 20:41:34 GMT Taras Shapovalov wrote:
>> Hi Daniel,
>> There are 4 alternatives remain: Slurm, UGE, PBS Pro and LSF. They are all
>> pretty similar and do their job pretty well.
>> Best regards,
>> On Mon, Nov 12, 2018 at 8:05 PM Daniel Povey <dpovey at gmail.com> wrote:
>>> I'm trying to understand the landscape of alternatives to Son of
>>> GridEngine, since the maintenance situation isn't great right now and I'm
>>> not sure that it has a long term future.
>>> If you guys were to switch to something in the same universe of products,
>>> what would it be to? Univa GridEngine? slurm? Which of these, as far as
>>> you know, is better maintained and has a better future?
>>> I'm not interested in fancy new things like mesos that have a different
>>> programming model or are too new.
>>> users mailing list
>>> users at gridengine.org
> Tina Friedrich, Snr HPC Systems Administrator, Advanced Research Computing
> Research Computing and Support Services, Academic IT
> IT Services, University of Oxford
> users mailing list
> users at gridengine.org
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 4673 bytes
Desc: not available
More information about the users