[gridengine users] Fwd: reattaching to an interactive job
d.love at liverpool.ac.uk
Sat May 26 14:21:17 UTC 2012
[In case this is still relevant.]
Reuti <reuti at staff.uni-marburg.de> writes:
>> I've tried screen a bit before, thanks. Someone else had idea which
>> might work even if the admin doesn't increase wallclock time. To
>> qlogin, *then* start screen and start the debugging process, then
>> detatch and logout. Then qlogin into the *same node* and
>> reattach. I'm going to experiment with that, see if it works.
> Well, this would violate the granted scheduling, and AFAICS the screen
> session will be terminated in a proper way due to the attached
> additonal group ID.
> NB: the ownership of the generated /dev/pts/x is wrong and needs to be
> fixed to have access to it as a user (in case you want to test it on
> your own).
That's fixed in the SGE development version.
Isn't there a general solution to debugging something that crashes after
a long time? Why not checkpoint at an appropriate interval and then
restart under the debugger? A single-node job is likely to work OK
under DMTCP, which is easy to use.
Community Grid Engine: http://arc.liv.ac.uk/SGE/
More information about the users