[gridengine users] Load sensors

Txema Heredia Genestar txema.heredia at upf.edu
Fri Apr 20 15:40:32 UTC 2012


Hi Earl,

Have you restarted the execution daemon in that host?
If there are running jobs, you can "softstop" it ( /etc/init.d/sgeexecd 
softstop ), and then start it again.

Txema


El 20/04/12 17:24, Earl Lazarus escribió:
> Indeed the script would not have been able to echo to my home 
> directory, so I changed the destination of the echo to /tmp/LD.  I 
> then changed the name of script slightly and went into qmon and fixed 
> the new spelling for both global and the one host where I am looking.  
> After a few minutes there is no sign of my echoes in /tmp or the 
> script name in a ps -elf on the one host.
>
> On Fri, Apr 20, 2012 at 9:42 AM, Reuti <reuti at staff.uni-marburg.de 
> <mailto:reuti at staff.uni-marburg.de>> wrote:
>
>     Am 20.04.2012 um 16:28 schrieb Earl Lazarus:
>
>     > Yes the load sensor is under my home directory which is visible
>     on all machines.  Would it be a true statement that my load sensor
>     should be running as soon as I specify it in the host
>     configuration?  I need not submit jobs that reference the load
>     that it is measuring.
>
>     If you change the definition in a local host configuration it's
>     necessary to change the global configuration to distribute it to
>     the node (just remove a blank somewhere). Then after 2 cycles of
>     the load_report_time the process should be visible on a node:
>
>     $ ps -e f
>     ...
>      5081 ?        Sl   125:47 /usr/sge/bin/lx24-amd64/sge_execd
>      5147 ?        S      2:39  \_ /bin/sh /usr/sge/cluster/tmpspace.sh
>
>     -- Reuti
>
>
>     > On Thu, Apr 19, 2012 at 9:30 PM, Ron Chen
>     <ron_chen_123 at yahoo.com <mailto:ron_chen_123 at yahoo.com>> wrote:
>     > I don't have access to a Unix machine now, so I assume the
>     script works.
>     >
>     > However, it is always the execution daemons that run the load
>     sensors, so
>     > make sure the load sensor is available on all the machines.
>     >
>     >  -Ron
>     >
>     >
>     >
>     > ________________________________
>     > From: Earl Lazarus <earl.lazarus at gmail.com
>     <mailto:earl.lazarus at gmail.com>>
>     > To: Rayson Ho <rayson at scalablelogic.com
>     <mailto:rayson at scalablelogic.com>>
>     > Cc: users at gridengine.org <mailto:users at gridengine.org>
>     > Sent: Thursday, April 19, 2012 9:49 PM
>     > Subject: Re: [gridengine users] Load sensors
>     >
>     >
>     > Here is  the load sensor...it basically checks to see if a
>     server is running on the host, returning 1 if yes
>     > and 0 if no.  It currently contains diagnostic prints to my home
>     directory.   It runs fine from the command prompt.
>     >
>     > When is a user provided load monitor actually run?  Every time
>     the scheduler runs?
>     >
>     > #!/bin/bash
>     > #PURPOSE  SGE load monitor
>     > #
>     > #
>     > good(){
>     >    echo "begin"
>     >    echo "$hst:earl_ecs_jun:1"
>     >    echo "end"
>     > }
>     > bad(){
>     >    echo "begin"
>     >    echo "$hst:earl_ecs_jun:0"
>     >    echo "end"
>     > }
>     >    echo START `date` >>/home/elazarus/LD
>     >    hst=$(uname -n)
>     >    pf="PID_FILE"
>     >    while [ 1 ] ; do
>     >       read input
>     >       result=$?
>     >       echo READ `date` >>/home/elazarus/LD
>     >       if [ $result != 0 ] ; then
>     >          exit 1
>     >       fi
>     >       if [ "$input" = "quit" ] ; then
>     >          echo END `date` >>/home/elazarus/LD
>     >          exit 0
>     >       fi
>     > #     --ASSERT VALID QUERY
>     >       tmpname=/tmp/jaeger/0p1/EDB/ECS_JUN_SS3_SL4h
>     >       if [ -d $tmpname ] ; then
>     >          cd $tmpname
>     > #        --EXAMINE THE PID_FILE
>     >          if [ -e $pf ] ; then
>     > #           --FOUND PID_FILE
>     >             pid=$(cat $pf)
>     >             l=$(ps h -p $pid |wc -l)
>     >             if [ $l -eq 0 ] ; then
>     > #              --CANNOT FIND THE SPECIFIED PROCESS
>     >                bad
>     >             else
>     > #              --IT'S RUNNING!!
>     >                good
>     >             fi
>     >          else
>     > #           --NO PID_FILE
>     >             bad
>     >          fi
>     >       else
>     > #        --NO SERVER DIRECTORY
>     >          bad
>     >       fi
>     >    done
>     >
>     >
>     >
>     >
>     > On Thu, Apr 19, 2012 at 7:18 PM, Rayson Ho
>     <rayson at scalablelogic.com <mailto:rayson at scalablelogic.com>> wrote:
>     >
>     > Can you post your load sensor, or at least the main structure of
>     your
>     > >load sensor script??
>     > >
>     > >If you run the script interactively, what do you get??
>     > >
>     > >Rayson
>     > >
>     > >
>     > >
>     > >
>     > >On Thu, Apr 19, 2012 at 8:14 PM, Earl Lazarus
>     <earl.lazarus at gmail.com <mailto:earl.lazarus at gmail.com>> wrote:
>     > >> I followed all of those directions...it just doesn't run.
>      Permissions are
>     > >> 777.
>     > >>  I put an "echo START `date` >>/home/<myid>/LD"
>     > >>
>     > >> The file is always empty.
>     > >>
>     > >>
>     > >> On Thu, Apr 19, 2012 at 12:37 PM, Rayson Ho
>     <rayson at scalablelogic.com <mailto:rayson at scalablelogic.com>>
>     > >> wrote:
>     > >>>
>     > >>> There is not a lot of actual "REQUIREMENTS" for a load
>     sensor. As long
>     > >>> as it prints the proper values to standard output, then it
>     is good
>     > >>> enough in most cases.
>     > >>>
>     > >>> You can get more detail from Oracle's doc:
>     > >>>
>     > >>>
>     > >>>
>     http://docs.oracle.com/cd/E24901_01/doc.62/e21978/configuration.htm#sthref182
>     > >>>
>     > >>> Rayson
>     > >>>
>     > >>>
>     > >>>
>     > >>> On Thu, Apr 19, 2012 at 1:31 PM, Earl Lazarus
>     <earl.lazarus at gmail.com <mailto:earl.lazarus at gmail.com>>
>     > >>> wrote:
>     > >>> > Based upon earlier postings, it looks like a load sensor
>     will solve my
>     > >>> > problem.  Others have
>     > >>> > pointed to the following link (which contains an example
>     of a load
>     > >>> > sensor
>     > >>> > script).
>     > >>> >
>     > >>> > http://gridscheduler.sourceforge.net/howto/loadsensor.html
>     > >>> >
>     > >>> > The example script at this site contains a "read"
>     statement and seems to
>     > >>> > communicate with SGE via "echo".  Is there someplace where
>     I can
>     > >>> > find the actual REQUIREMENTS for a load sensor script
>     instead of
>     > >>> > having to reverse engineer the requirements from an example?
>     > >>> >
>     > >>> > _______________________________________________
>     > >>> > users mailing list
>     > >>> > users at gridengine.org <mailto:users at gridengine.org>
>     > >>> > https://gridengine.org/mailman/listinfo/users
>     > >>> >
>     > >>
>     > >>
>     > >>
>     > >> _______________________________________________
>     > >> users mailing list
>     > >> users at gridengine.org <mailto:users at gridengine.org>
>     > >> https://gridengine.org/mailman/listinfo/users
>     > >>
>     > >
>     >
>     > _______________________________________________
>     > users mailing list
>     > users at gridengine.org <mailto:users at gridengine.org>
>     > https://gridengine.org/mailman/listinfo/users
>     >
>     > _______________________________________________
>     > users mailing list
>     > users at gridengine.org <mailto:users at gridengine.org>
>     > https://gridengine.org/mailman/listinfo/users
>
>
>
>
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20120420/41d6c4a1/attachment-0001.html>


More information about the users mailing list