[gridengine users] Strange CPU time usages being recorded for jobs

Lane, William William.Lane at cshs.org
Sat Feb 13 00:28:21 UTC 2016


We track our SGE cluster statistics through the accounting file, which we import into
a separate mySQL.

We've had some really strange results show up for job CPU time usage recently:

mysql> SELECT owner, job_number AS job_num, CPU/3600 AS "CPU", ru_wallclock/3600 AS "RT", FROM_UNIXTIME(start_time) AS strt_time, FROM_UNIXTIME(end_time), FROM_UNIXTIME(submission_time) AS sbmt_time, (CPU/3600) FROM csclprd3 WHERE (start_time >= UNIX_TIMESTAMP('2015-07-01')) AND (start_time < UNIX_TIMESTAMP('2015-08-01')) AND (start_time <> end_time) AND (start_time <> 0) AND (end_time <> 0) AND ((CPU/3600) > (6*(ru_wallclock/3600))) AND owner='pangjx';
+---------+------------+---------------------+-------------+-----------------------------+-------------------------------------------+-------------------------------+-------------------+
| owner | job_num | CPU         | RT      | start_time         | FROM_UNIXTIME(end_time) | sbmt_time            | (CPU/3600)  |
+---------+------------+---------------------+-------------+-----------------------------+-------------------------------------------+-------------------------------+-------------------+


| pangjx |  143320 | 2777777.7775 | 145.3114 | 2015-07-23 21:56:51 | 2015-07-29 23:15:32          | 2015-07-23 21:56:40 | 2777777.7775 |
| pangjx |  154178 |       7.8869 |   0.7439 | 2015-07-29 15:02:28 | 2015-07-29 15:47:06          | 2015-07-29 15:02:18 |       7.8869 |
| pangjx |  154265 |       7.4861 |   0.7106 | 2015-07-29 15:02:29 | 2015-07-29 15:45:07        | 2015-07-29 15:02:23 |       7.4861 |
| pangjx |  154244 |       5.0086 |   0.6397 | 2015-07-29 15:02:28 | 2015-07-29 15:40:51         | 2015-07-29 15:02:23 |       5.0086 |
| pangjx |  154196 |      10.0081 |   0.6386 | 2015-07-29 15:02:28 | 2015-07-29 15:40:47        | 2015-07-29 15:02:19 |      10.0081 |
| pangjx |  154136 |       5.1428 |   0.6375 | 2015-07-29 15:02:28 | 2015-07-29 15:40:43        | 2015-07-29 15:02:17 |       5.1428 |
| pangjx |  154217 |       5.2989 |   0.5658 | 2015-07-29 15:02:28 | 2015-07-29 15:36:25        | 2015-07-29 15:02:19 |       5.2989 |
| pangjx |  154233 |       4.3808 |   0.5581 | 2015-07-29 15:02:28 | 2015-07-29 15:35:57         | 2015-07-29 15:02:22 |       4.3808 |
| pangjx |  154157 |       5.4767 |   0.5517 | 2015-07-29 15:02:28 | 2015-07-29 15:35:34          | 2015-07-29 15:02:18 |       5.4767 |
| pangjx |  154152 |       3.1375 |   0.4356 | 2015-07-29 15:02:28 | 2015-07-29 15:28:36         | 2015-07-29 15:02:18 |       3.1375 |
| pangjx |  143359 | 2777777.7775 | 127.8125 | 2015-07-23 21:56:52 | 2015-07-29 05:45:37         | 2015-07-23 21:56:42 | 2777777.7775 |
| pangjx |  143334 | 2777777.7775 | 123.9389 | 2015-07-23 21:56:51 | 2015-07-29 01:53:11         | 2015-07-23 21:56:41 | 2777777.7775 |
| pangjx |  143329 |     945.1042 | 115.6944 | 2015-07-23 21:56:51 | 2015-07-28 17:38:31         | 2015-07-23 21:56:41 |     945.1042 |
| pangjx |  143355 |     766.4269 | 100.3900 | 2015-07-23 21:56:52 | 2015-07-28 02:20:16         | 2015-07-23 21:56:42 |     766.4269 |
| pangjx |  143377 | 2777777.7775 |  99.1744 | 2015-07-23 21:56:52 | 2015-07-28 01:07:20         | 2015-07-23 21:56:43 | 2777777.7775 |

For a job to have CPU time usage statistics that are 27777 times greater than the runtime of the job is impossible isn't it?

Our cluster has nowhere near 27000 cores (even with hyperthreading).

-Bill L.
IMPORTANT WARNING: This message is intended for the use of the person or entity to which it is addressed and may contain information that is privileged and confidential, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this information is strictly prohibited. Thank you for your cooperation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gridengine.org/pipermail/users/attachments/20160213/63394a7d/attachment.html>


More information about the users mailing list