[gridengine users] write my own accounting log parser..
stuartb at 4gh.net
Fri May 6 17:17:22 UTC 2011
On Thu, 5 May 2011 at 16:46 -0000, William Deegan wrote:
> I'm pondering writing a python based dbwriter replacement which
> would just parse the accounting file and stuff it in a db, and then
> have some python web app framework for reporting.
> Has anyone already done this?
> Any suggestions on how to "eat" the accounting log file? (consume it
> so it never gets big? Do I rotate it out and parse and discard?)
Yes, just roll the log file (see responses to your question on that).
I have done various simple reports with the qacct command. With just
a little extra work the summary information can be put into .csv files
that someone else can use to make the necessary pie charts, graphs and
other pretty things management likes to see.
For simple usage reporting I use something like:
zcat /opt/sge_root/betsy/common/accounting-201* > sge-acct.tmp
qacct -o -b 201101010000 -e 201104010000 -f sge-acct.tmp > sge-2011Q1.rpt
qacct -D -g -o -pe -P -b 201101010000 -e 201104010000 -f sge-acct.tmp > sge-2011Q1-full.rpt
qacct -D -o -P -e 201104010000 -f Mirror/sge-acct.tmp | grep -v '============' | perl -ne 'chomp;s/\s+/\t/g;print $_."\n";' > sge-all-time.csv
With a little over 2M total records these commands take very little
time to execute. We have about 30M records in the reporting files.
As our usage ramps up I do expect the number of records/day to grow
At our current usage level sequential processing of the raw records
looks perfectly usable.
Is there specific documentation about the -d and -e options on qacct
(man qacct doesn't specify)? Are these times inclusive or exclusive?
What does qacct do about jobs which span the start of end time?
When you get to proper accounting, these details are very important.
Being sure these edges don't duplicate or miss jobs and doing the
desired processing for jobs splitting reporting boundaries is another
I may eventually do my own processing of the accounting records.
At some point I may look again at the ARCo stuff in sge. The ARCo data
base may be useful to summarize data (including the extra data in the
reporting file). To me the java web interface looks very heavy and
would require a lot of work to get through a security review (multiple
web ports opened, new login methods, etc).
Another issue that we have is that we also run torque/Moab on another
cluster and I really want to normalize any formal reporting system to
include information from both clusters.
I may get a summer statistics student to look at the accounting and
reporting logs to see what sort of useful information might be gained.
The accounting and reporting logs look like they can be easily
converted into a data set suitable for analysis with R. Usage
summaries by various parameters are probably pretty easy. Focusing on
a few specific users and jobs might be interesting. Has anyone
performed any larger statistical analysis of this information?
I've never been lost; I was once bewildered for three days, but never lost!
-- Daniel Boone
More information about the users