[gridengine users] Grid Engine Sluggish

Daniel Povey dpovey at gmail.com
Sat Jan 26 18:16:58 UTC 2019


Check if there are any huge jobs in the queue.  Sometimes very large task
ranges, or large numbers of jobs, can make it slow.

On Sat, Jan 26, 2019 at 7:05 AM Reuti <reuti at staff.uni-marburg.de> wrote:

> Hi,
>
> > Am 26.01.2019 um 10:20 schrieb Joseph Farran <jfarran at uci.edu>:
> >
> > Hi.
> > Our Grid Engine is running very sluggish all of a sudden. Sqe_qmaster
> stays at 100% all the time where is used to be 100% for a few seconds every
> 30 seconds or so.
> > I ran the qping command but not sure how to read it.   Any helpful
> insight much appreciated
>
> Did you try to stop and start the qmaster?
>
> -- Reuti
>
>
> > qping -i 5 -info hpc-s 6444 qmaster 1
> > 01/26/2019 01:12:18:
> > SIRM version:             0.1
> > SIRM message id:          1
> > start time:               01/26/2019 01:10:13 (1548493813)
> > run time [s]:             125
> > messages in read buffer:  0
> > messages in write buffer: 0
> > no. of connected clients: 296
> > status:                   0
> > info:                     MAIN: R (125.20) | signaler000: R (123.69) |
> event_master000: R (0.14) | timer000: R (4.52) | worker000: R (0.14) |
> worker001: R (3.44) | worker002: R (7.33) | worker003: R (3.43) |
> worker004: R (3.08) | worker005: R (1.42) | OK
> > malloc:                   arena(34410496) |ordblks(9370) |
> smblks(164269) | hblksr(0) | hblhkd(0) usmblks(0) | fsmblks(7726000) |
> uordblks(24248176) | fordblks(10162320) | keepcost(119856)
> > Monitor:
> > 01/26/2019 01:10:13 | MAIN: no monitoring data available
> > 01/26/2019 01:10:14 | signaler000: no monitoring data available
> > 01/26/2019 01:12:14 | event_master000: runs: 4.82r/s (clients: 1.00 mod:
> 0.02/s ack: 0.02/s blocked: 0.00 busy: 0.81 | events: 5.52/s added: 5.47/s
> skipt: 0.05/s) out: 0.00m/s APT: 0.0002s/m idle: 99.89% wait: 0.00% time:
> 60.00s
> > 01/26/2019 01:12:14 | timer000: runs: 0.47r/s (pending: 12.00 executed:
> 0.45/s) out: 0.00m/s APT: 0.0002s/m idle: 99.99% wait: 0.00% time: 60.00s
> > 01/26/2019 01:11:19 | worker000: runs: 0.68r/s (EXECD
> (l:0.32,j:0.28,c:0.32,p:0.00,a:0.00)/s GDI
> (a:0.25,g:1.08,m:0.00,d:0.00,c:0.00,t:0.00,p:0.00)/s OTHER (ql:0)) out:
> 0.82m/s APT: 0.0036s/m idle: 99.75% wait: 0.00% time: 64.96s
> > 01/26/2019 01:12:15 | worker001: runs: 0.81r/s (EXECD
> (l:0.02,j:0.02,c:0.02,p:0.00,a:0.00)/s GDI
> (a:0.00,g:1.92,m:0.08,d:0.00,c:0.00,t:0.00,p:0.00)/s OTHER (ql:0)) out:
> 0.81m/s APT: 0.0008s/m idle: 99.93% wait: 0.00% time: 59.27s
> > 01/26/2019 01:11:16 | worker002: runs: 0.73r/s (EXECD
> (l:0.28,j:0.23,c:0.26,p:0.00,a:0.00)/s GDI
> (a:0.34,g:1.13,m:0.00,d:0.00,c:0.00,t:0.00,p:0.00)/s OTHER (ql:0)) out:
> 0.71m/s APT: 0.0030s/m idle: 99.78% wait: 0.17% time: 61.75s
> > 01/26/2019 01:12:15 | worker003: runs: 0.75r/s (EXECD
> (l:0.03,j:0.02,c:0.03,p:0.00,a:0.00)/s GDI
> (a:0.02,g:1.23,m:0.07,d:0.00,c:0.00,t:0.00,p:0.00)/s OTHER (ql:0)) out:
> 0.73m/s APT: 0.0008s/m idle: 99.94% wait: 0.02% time: 60.40s
> > 01/26/2019 01:11:26 | worker004: runs: 0.68r/s (EXECD
> (l:0.23,j:0.21,c:0.23,p:0.00,a:0.00)/s GDI
> (a:0.27,g:1.69,m:0.00,d:0.00,c:0.00,t:0.00,p:0.00)/s OTHER (ql:0)) out:
> 0.65m/s APT: 0.0012s/m idle: 99.92% wait: 0.00% time: 71.11s
> > 01/26/2019 01:11:31 | worker005: runs: 0.56r/s (EXECD
> (l:0.25,j:0.24,c:0.25,p:0.00,a:0.00)/s GDI
> (a:0.20,g:1.05,m:0.00,d:0.00,c:0.00,t:0.00,p:0.00)/s OTHER (ql:0)) out:
> 0.55m/s APT: 0.0011s/m idle: 99.94% wait: 0.00% time: 76.48s
> >
> > Joseph
> >
> >
> > _______________________________________________
> > users mailing list
> > users at gridengine.org
> > https://gridengine.org/mailman/listinfo/users
> >
>
>
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20190126/215b9067/attachment.html>


More information about the users mailing list