[gridengine users] How to use condor checkpointing with SGE
reuti at staff.uni-marburg.de
Wed Mar 9 20:22:51 UTC 2011
Am 09.03.2011 um 17:38 schrieb Hung-Sheng Tsao (laotsao 老曹 ) Ph.D:
> it seems that you need to add all.q to
> queue_list in your checkpoint object
This was indeed the necessary setup for the 5.3 version. But it was moved to the queue configuration.
> On 03/ 9/11 11:25 AM, Lane Schwartz wrote:
>> I would like to use condor's standalone checkpointing to enable
>> checkpointing jobs that are run via Sun Grid Engine (SGE). I've
>> successfully compiled a toy C program using condor_compile, and I can
>> successfully run, stop, and resume the job with its checkpoint file.
>> When I attempt to run my toy using qsub as an SGE job with
>> checkpointing enabled, the job gets queued up but never runs. The job
>> runs fine if submitted without checkpointing. Has anyone here
>> successfully run SGE jobs using condor checkpointing?
>> For reference, here's my configuration. Within SGE's qmon utility, I
>> defined a checkpoint object called "condor" the following
>> Name: condor
>> Interface: TRANSPARENT
>> Checkpoint command: NONE
>> Migrate command: NONE
>> Clean command: NONE
>> Checkpoint directory: /tmp
>> Checkpoint When: xsr
>> Checkpoint Signal: NONE
>> To submit the job with checkpointing, I ran this:
>> qsub -ckpt condor /home/lane/toy.sh -_condor
>> Where toy.sh is:
>> /usr/bin/setarch x86_64 -R -L /home/lane/toy -_condor_D_ALL
>> The job as submitted above gets a "qw" status, but never runs. If I
>> submitting the job without "-ckpt condor" then it runs.
>> Any pointers to tips would be appreciated. I've done quite a bit of
>> research online; it appears that this should be possible, but I just
>> haven't had any success figuring out how.
>> users mailing list
>> users at gridengine.org
> users mailing list
> users at gridengine.org
More information about the users