[gridengine users] (no subject)
zszhong5 at gmail.com
Sun Feb 15 16:49:21 UTC 2015
We are using the SUN SGE to manage a small cluster system. There are about
fifty computing nodes. Users could submit their jobs on one independent
manager node using qsub command.
Recently we have faced a problem: One could submit a job and the SGE could
dispatch the job into one free computing node. And suppose the returned job
id is 1000. When using qstat -j 1000, it shows the job status is r which
means the SGE thought the job is running and at the same time, it also
shows which machine the job is running on, suppose machine1. But when we
using the ssh machine1 and using the top command to show the usage of
resources by the running processes related to the job, it shows nothing.
Ideally, we expect it shows the CPU usage by the submitting job, but it
didn't. We also tried the `qrsh` to login into that node, and there are
also no information about the processes about the job.
Another problem is, when one submit multiple jobs at the same time, the SGE
will dispatch these jobs into one or few computing nodes. But in fact,
there are many other computing nodes are free or not busy. What's the
possible problem with the SGE? We expect the SGE could preferentially
dispatch the latest submitted jobs into the free computing nodes.
Could anyone help give some advice or references, please? Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the users