[gridengine users] Load sensors
Txema Heredia Genestar
txema.heredia at upf.edu
Fri Apr 20 15:40:32 UTC 2012
Hi Earl,
Have you restarted the execution daemon in that host?
If there are running jobs, you can "softstop" it ( /etc/init.d/sgeexecd
softstop ), and then start it again.
Txema
El 20/04/12 17:24, Earl Lazarus escribió:
> Indeed the script would not have been able to echo to my home
> directory, so I changed the destination of the echo to /tmp/LD. I
> then changed the name of script slightly and went into qmon and fixed
> the new spelling for both global and the one host where I am looking.
> After a few minutes there is no sign of my echoes in /tmp or the
> script name in a ps -elf on the one host.
>
> On Fri, Apr 20, 2012 at 9:42 AM, Reuti <reuti at staff.uni-marburg.de
> <mailto:reuti at staff.uni-marburg.de>> wrote:
>
> Am 20.04.2012 um 16:28 schrieb Earl Lazarus:
>
> > Yes the load sensor is under my home directory which is visible
> on all machines. Would it be a true statement that my load sensor
> should be running as soon as I specify it in the host
> configuration? I need not submit jobs that reference the load
> that it is measuring.
>
> If you change the definition in a local host configuration it's
> necessary to change the global configuration to distribute it to
> the node (just remove a blank somewhere). Then after 2 cycles of
> the load_report_time the process should be visible on a node:
>
> $ ps -e f
> ...
> 5081 ? Sl 125:47 /usr/sge/bin/lx24-amd64/sge_execd
> 5147 ? S 2:39 \_ /bin/sh /usr/sge/cluster/tmpspace.sh
>
> -- Reuti
>
>
> > On Thu, Apr 19, 2012 at 9:30 PM, Ron Chen
> <ron_chen_123 at yahoo.com <mailto:ron_chen_123 at yahoo.com>> wrote:
> > I don't have access to a Unix machine now, so I assume the
> script works.
> >
> > However, it is always the execution daemons that run the load
> sensors, so
> > make sure the load sensor is available on all the machines.
> >
> > -Ron
> >
> >
> >
> > ________________________________
> > From: Earl Lazarus <earl.lazarus at gmail.com
> <mailto:earl.lazarus at gmail.com>>
> > To: Rayson Ho <rayson at scalablelogic.com
> <mailto:rayson at scalablelogic.com>>
> > Cc: users at gridengine.org <mailto:users at gridengine.org>
> > Sent: Thursday, April 19, 2012 9:49 PM
> > Subject: Re: [gridengine users] Load sensors
> >
> >
> > Here is the load sensor...it basically checks to see if a
> server is running on the host, returning 1 if yes
> > and 0 if no. It currently contains diagnostic prints to my home
> directory. It runs fine from the command prompt.
> >
> > When is a user provided load monitor actually run? Every time
> the scheduler runs?
> >
> > #!/bin/bash
> > #PURPOSE SGE load monitor
> > #
> > #
> > good(){
> > echo "begin"
> > echo "$hst:earl_ecs_jun:1"
> > echo "end"
> > }
> > bad(){
> > echo "begin"
> > echo "$hst:earl_ecs_jun:0"
> > echo "end"
> > }
> > echo START `date` >>/home/elazarus/LD
> > hst=$(uname -n)
> > pf="PID_FILE"
> > while [ 1 ] ; do
> > read input
> > result=$?
> > echo READ `date` >>/home/elazarus/LD
> > if [ $result != 0 ] ; then
> > exit 1
> > fi
> > if [ "$input" = "quit" ] ; then
> > echo END `date` >>/home/elazarus/LD
> > exit 0
> > fi
> > # --ASSERT VALID QUERY
> > tmpname=/tmp/jaeger/0p1/EDB/ECS_JUN_SS3_SL4h
> > if [ -d $tmpname ] ; then
> > cd $tmpname
> > # --EXAMINE THE PID_FILE
> > if [ -e $pf ] ; then
> > # --FOUND PID_FILE
> > pid=$(cat $pf)
> > l=$(ps h -p $pid |wc -l)
> > if [ $l -eq 0 ] ; then
> > # --CANNOT FIND THE SPECIFIED PROCESS
> > bad
> > else
> > # --IT'S RUNNING!!
> > good
> > fi
> > else
> > # --NO PID_FILE
> > bad
> > fi
> > else
> > # --NO SERVER DIRECTORY
> > bad
> > fi
> > done
> >
> >
> >
> >
> > On Thu, Apr 19, 2012 at 7:18 PM, Rayson Ho
> <rayson at scalablelogic.com <mailto:rayson at scalablelogic.com>> wrote:
> >
> > Can you post your load sensor, or at least the main structure of
> your
> > >load sensor script??
> > >
> > >If you run the script interactively, what do you get??
> > >
> > >Rayson
> > >
> > >
> > >
> > >
> > >On Thu, Apr 19, 2012 at 8:14 PM, Earl Lazarus
> <earl.lazarus at gmail.com <mailto:earl.lazarus at gmail.com>> wrote:
> > >> I followed all of those directions...it just doesn't run.
> Permissions are
> > >> 777.
> > >> I put an "echo START `date` >>/home/<myid>/LD"
> > >>
> > >> The file is always empty.
> > >>
> > >>
> > >> On Thu, Apr 19, 2012 at 12:37 PM, Rayson Ho
> <rayson at scalablelogic.com <mailto:rayson at scalablelogic.com>>
> > >> wrote:
> > >>>
> > >>> There is not a lot of actual "REQUIREMENTS" for a load
> sensor. As long
> > >>> as it prints the proper values to standard output, then it
> is good
> > >>> enough in most cases.
> > >>>
> > >>> You can get more detail from Oracle's doc:
> > >>>
> > >>>
> > >>>
> http://docs.oracle.com/cd/E24901_01/doc.62/e21978/configuration.htm#sthref182
> > >>>
> > >>> Rayson
> > >>>
> > >>>
> > >>>
> > >>> On Thu, Apr 19, 2012 at 1:31 PM, Earl Lazarus
> <earl.lazarus at gmail.com <mailto:earl.lazarus at gmail.com>>
> > >>> wrote:
> > >>> > Based upon earlier postings, it looks like a load sensor
> will solve my
> > >>> > problem. Others have
> > >>> > pointed to the following link (which contains an example
> of a load
> > >>> > sensor
> > >>> > script).
> > >>> >
> > >>> > http://gridscheduler.sourceforge.net/howto/loadsensor.html
> > >>> >
> > >>> > The example script at this site contains a "read"
> statement and seems to
> > >>> > communicate with SGE via "echo". Is there someplace where
> I can
> > >>> > find the actual REQUIREMENTS for a load sensor script
> instead of
> > >>> > having to reverse engineer the requirements from an example?
> > >>> >
> > >>> > _______________________________________________
> > >>> > users mailing list
> > >>> > users at gridengine.org <mailto:users at gridengine.org>
> > >>> > https://gridengine.org/mailman/listinfo/users
> > >>> >
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> users mailing list
> > >> users at gridengine.org <mailto:users at gridengine.org>
> > >> https://gridengine.org/mailman/listinfo/users
> > >>
> > >
> >
> > _______________________________________________
> > users mailing list
> > users at gridengine.org <mailto:users at gridengine.org>
> > https://gridengine.org/mailman/listinfo/users
> >
> > _______________________________________________
> > users mailing list
> > users at gridengine.org <mailto:users at gridengine.org>
> > https://gridengine.org/mailman/listinfo/users
>
>
>
>
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20120420/41d6c4a1/attachment-0001.html>
More information about the users
mailing list