[gridengine users] execution daemon on host * didn't accept task

William Hay w.hay at ucl.ac.uk
Wed Nov 16 16:25:07 UTC 2011


On 16 November 2011 13:52, Vang Le <lqvang79 at gmail.com> wrote:
> Hi William and Reuti,
> Thank you for your suggestions and your time. They are really helpful. I
> solved almost of my problems.
>
> I installed rsh-redone-client and rsh-redone-server, also I modify my PE so
> that "control_slaves TRUE" is set. I can run  this part now:
>
> mpirun -np $NSLOTS hostname
> mpirun -np $NSLOTS ~/hello
>
> However I still can not start interactive PE with: qsh or qrsh. They both
> said:
> ---------
> $ qrsh -pe test_pe 5
> Your "qrsh" request could not be scheduled, try again later.
> ---------
> qsh -pe test_pe 5
> Your job 50 ("INTERACTIVE") has been submitted
> waiting for interactive job to be scheduled ...
>
> Your "qsh" request could not be scheduled, try again later.
> ---------
>
> I googled and there was something mentioned about editing /etc/hosts.equiv
> file to permit rsh and rlogin without password. However, typing "qconf
> -mconf" at the management host, I saw this:
> ----
> rlogin_daemon                /usr/sbin/sshd -i
> rlogin_command               /usr/bin/ssh
> ----
>
> Do I need to change something in the queue and PE to run interactive PE?
Check qtype in the queue_conf is either INTERACTIVE or BATCH
INTERACTIVE if you want to run without -now n

William

>
> Regards
> Vang.
>
> On 11/16/11 11:03 AM, Reuti wrote:
>
> Hi,
>
> Am 16.11.2011 um 04:29 schrieb Vang Le:
>
> Hello GridUsers,
> My grid is running, it can deliver jobs, but they only run on one nodes at a
> time.
> When I tried running with mpirun in a batch script, i get errors like
> "execution daemon on host  <hostname> didn't accept task" as shown at the
> bottom of this email.
>
> can you please check, whether your Open MPI was built with support for SGE
> properly:
>
> $ ompi_info | grep grid
>                  MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.4.3)
>
> A simple `hostname` should work. You installed this version of Open MPI on
> all machines? What does your PE definition look like: "control_slaves TRUE"
> is set?
>
> -- Reuti
>
>
> I can run mpirun outside of sge without any problems.
> I am suspecting that when mpirun is put inside the sge batch script, it can
> not communicate with exec nodes successfully.
>
>
> My system information:
> 3 servers running Ubuntu Lucid Lynx with recompiled openmpi to support
> gridengine. SGE was installed via Ubuntu repository setup correct
> environmental variables.
> I also setup non-password ssh access for openmpi user account, which is the
> same account that I use to submit sge batch.
>
>
> Any help is very much appreciated.
>
> Vang.
>
>
>
>
> ============ERROR================
> error: executing task of job 63 failed: execution daemon on host "node1"
> didn't accept task
> error: executing task of job 63 failed: execution daemon on host
> "submithost" didn't accept task
> --------------------------------------------------------------------------
> A daemon (pid 13317) died unexpectedly with status 1 while attempting
> to launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
>
>
> ============CONTENT OF SGE BATCH SUBMIT==============
>
> #!/bin/bash
>
> # run at current working directory
> #$ -cwd
> #$ -V
> # Specify the shell for this job
> #$ -S /bin/bash
> #$ -pe test_pe 5
> #$ -P test1
>
> # Merge the standard output and standard error
> #$ -j y
>
> # Specify the location of the output messages
> #$ -o messages.txt
>
> #---------Customization part starts below -------------
> # Customization
> # Which email should the start running and edning of this job be emailed to
> #
> #$ -M <my_gmail_id>@gmail.com
> #$ -m be
>
> export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib:$LD_LIBRARY_PATH
>
> mpirun -np $NSLOTS hostname
> mpirun -np $NSLOTS ~/hello
>
>
>
>
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users
>
>



More information about the users mailing list