[gridengine users] sge_shepherd using 100% CPU

John_Tai John_Tai at smics.com
Tue May 16 01:44:09 UTC 2017


>> And the opened shell is idling?

No, it's working normally.

>> How do you log in by this method – the default "builtin" method or anything self defined?

Default, I didn't define anything. Don't even know how to.

>> Any global or queue prolog in place, which is supposed to run under the sge account?

There are no prolog/epilog defined.

>> From the root account you can `strace -p 22443` and check what is going on therein.

It keeps looping through these messages. Is it normal?

alarm(0)                                = 0
poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6, revents=POLLHUP}])
wait4(-1, 0x7fffa369d95c, WNOHANG, 0x7fffa369e300) = 0
alarm(0)                                = 0
alarm(0)                                = 0
poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6, revents=POLLHUP}])
wait4(-1, 0x7fffa369d95c, WNOHANG, 0x7fffa369e300) = 0
alarm(0)                                = 0
alarm(0)                                = 0
poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6, revents=POLLHUP}])
wait4(-1, 0x7fffa369d95c, WNOHANG, 0x7fffa369e300) = 0
alarm(0)                                = 0
alarm(0)                                = 0
poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6, revents=POLLHUP}])
wait4(-1, 0x7fffa369d95c, WNOHANG, 0x7fffa369e300) = 0
alarm(0)                                = 0
alarm(0)                                = 0
poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6, revents=POLLHUP}])



-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de]
Sent: Monday, May 15, 2017 5:46
To: John_Tai
Cc: users at gridengine.org
Subject: Re: [gridengine users] sge_shepherd using 100% CPU

Hi,

> Am 15.05.2017 um 05:28 schrieb John_Tai <John_Tai at smics.com>:
>
> I recently found a weird problem with qrsh.
>
> If I just use it to login to an exec host, the sge_shepherd uses 100% of CPU.
>
> # qrsh -q lc.q at ibm105
> # top
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 22443 sge       25   0 20396 1604 1272 R 99.5  0.0   0:08.80 sge_shepherd
> 19927 sge       16   0  114m 3096 1836 S  0.0  0.0   0:00.26 sge_execd

And the opened shell is idling?

How do you log in by this method – the default "builtin" method or anything self defined?

In my clusters I can't observe this behavior.

Even if there would be something running in any of the shell's profile: it should show up for the opened shell but not for the sge_shepherd which runs under the sge admin account.

Any global or queue prolog in place, which is supposed to run under the sge account?

==

>From the root account you can `strace -p 22443` and check what is going on therein.

-- Reuti


>  But if I submit an actual command with qrsh this doesn’t happen.
>
> # qrsh -q lc.q at ibm105 xclock
> # top
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 19927 sge       16   0  114m 3100 1836 S  0.0  0.0   0:00.38 sge_execd
> 22671 sge       18   0 20392 1584 1256 S  0.0  0.0   0:00.00 sge_shepherd
>
> Not sure why that is. How do I troubleshoot this?
>
> Thanks
> Johnt
> This email (including its attachments, if any) may be confidential and proprietary information of SMIC, and intended only for the use of the named recipient(s) above. Any unauthorized use or disclosure of this email is strictly prohibited. If you are not the intended recipient(s), please notify the sender immediately and delete this email from your computer.
>
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users

________________________________

This email (including its attachments, if any) may be confidential and proprietary information of SMIC, and intended only for the use of the named recipient(s) above. Any unauthorized use or disclosure of this email is strictly prohibited. If you are not the intended recipient(s), please notify the sender immediately and delete this email from your computer.




More information about the users mailing list