[gridengine users] How to use condor checkpointing with SGE

Lane Schwartz dowobeha at gmail.com
Sat Mar 12 13:18:31 UTC 2011


On Fri, Mar 11, 2011 at 6:09 PM, Dave Love <d.love at liverpool.ac.uk> wrote:
> What's the attraction of the Condor stuff over BLCR or DMTCP?

I don't have any prior experience with checkpointing. I simply
searched around, and Condor seemed to be the easiest and most
straightforward to setup.

If you have any experience or recommendations wrt other checkpointing
methods, I would be very interested in your perspective.

My requirements are pretty straightforward. I have a set of jobs to
run. I want to be able to assign priorities within the set of my jobs
(no need to prioritize my jobs with respect to other users' jobs).
Under certain conditions, I need to re-prioritize my jobs. When that
happens, if a lower priority job is running, I would like to
checkpoint it, remove it from running, and put it back in the queue.

My approach so far has been to try using Condor and assign priorities
by changing -js job share. If there is a better way, I'd love to hear
about it.

Thanks,
Lane


More information about the users mailing list