[gridengine users] How to use condor checkpointing with SGE
"Hung-Sheng Tsao (laotsao 老曹 ) Ph.D"
laotsao at gmail.com
Wed Mar 9 16:38:07 UTC 2011
it seems that you need to add all.q to
queue_list in your checkpoint object
On 03/ 9/11 11:25 AM, Lane Schwartz wrote:
> Hi,
>
> I would like to use condor's standalone checkpointing to enable
> checkpointing jobs that are run via Sun Grid Engine (SGE). I've
> successfully compiled a toy C program using condor_compile, and I can
> successfully run, stop, and resume the job with its checkpoint file.
>
> When I attempt to run my toy using qsub as an SGE job with
> checkpointing enabled, the job gets queued up but never runs. The job
> runs fine if submitted without checkpointing. Has anyone here
> successfully run SGE jobs using condor checkpointing?
>
> For reference, here's my configuration. Within SGE's qmon utility, I
> defined a checkpoint object called "condor" the following
> configuration:
>
> Name: condor
> Interface: TRANSPARENT
> Checkpoint command: NONE
> Migrate command: NONE
> Clean command: NONE
> Checkpoint directory: /tmp
> Checkpoint When: xsr
> Checkpoint Signal: NONE
>
> To submit the job with checkpointing, I ran this:
> qsub -ckpt condor /home/lane/toy.sh -_condor
>
> Where toy.sh is:
> #!/bin/bash
>
> /usr/bin/setarch x86_64 -R -L /home/lane/toy -_condor_D_ALL
>
>
> The job as submitted above gets a "qw" status, but never runs. If I
> submitting the job without "-ckpt condor" then it runs.
>
> Any pointers to tips would be appreciated. I've done quite a bit of
> research online; it appears that this should be possible, but I just
> haven't had any success figuring out how.
>
> Cheers,
> Lane
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: laotsao.vcf
Type: text/x-vcard
Size: 277 bytes
Desc: not available
URL: <http://gridengine.org/pipermail/users/attachments/20110309/696c4536/attachment.vcf>
More information about the users
mailing list