[gridengine users] OGS h_vmem and coredump
Reuti
reuti at staff.uni-marburg.de
Tue Jun 12 14:19:44 UTC 2012
Am 12.06.2012 um 15:59 schrieb Mazouzi:
> Or define "h_core 0" in the queue definition to disable it by default.
>
> Hi
>
> In this case, I noticed that when the application (VASP) the requested exceed the h_mem. SGE kill the job and i can see an out put from qacct :
>
> failed 100 : assumedly after job
>
> But Nothing is returned for the user like (Segmentation Fault or memory exceeded)
>
> Is there a way to show a custom message to the user when they exceed the requested h_vmem ?
Yes, you will need a mail-wrapper which looks into the messages file of the exechost, scanning for an entry for this particular job. For a parallel job this will work for the master node only (where also the job script ran).
$ cat mailer.sh
#!/bin/sh
#
# Distinguish between normal jobs and an array job.
#
case `echo "$2" | cut -d " " -f 1` in
Job) JOB_ID=`echo "$2" | cut -d " " -f 2`
CONDITION=`echo "$2" | cut -d " " -f 4` ;;
Job-array) JOB_ID=`echo "$2" | cut -d " " -f 3`
CONDITION=`echo "$2" | cut -d " " -f 5` ;;
*) ;;
esac
#
# Get the reason in case of an abortion of the job.
#
if [ "$CONDITION" = "Aborted" ]; then
if [ -f /var/spool/sge/$HOSTNAME/messages -a -r /var/spool/sge/$HOSTNAME/messages ]; then
APPENDIX=`egrep "[|]job $JOB_ID([.][[:digit:]]+)? exceed" /var/spool/sge/$HOSTNAME/messages | head -n 1`
fi
if [ -z "$APPENDIX" ]; then
APPENDIX="Unknown, no entry found in messages file on the master node of the job."
fi
fi
#
# Now construct and send the email.
#
if [ -n "$APPENDIX" ]; then
(cat; echo; echo "Reason for job abort:"; echo $APPENDIX) | mail -s "$2" "$3"
else
mail -s "$2" "$3"
fi
-- Reuti
More information about the users
mailing list