[gridengine users] maintaining spooldb/sge_job
rayson at scalablelogic.com
Sat May 19 22:58:33 UTC 2012
I read the BerkeleyDB documentation at Oracle last year, IIRC Oracle
did not ask for shutdown of the process that uses BerkeleyDB (in our
case, that's qmaster) before db_archive can be run:
Since you are not using the BDB RPC server, then you don't need to run
db_checkpoint (bdb_checkpoint.sh calls db_checkpoint), and can simply
run "db_archive -d" in a weekly crontab.
On Fri, May 18, 2012 at 2:39 PM, Simon Matthews
<simon.d.matthews at gmail.com> wrote:
> Thanks for pointing this out to me
> The documentation says that it should be used every minute if the
> configuration uses a BDB server. I don't use a BDB server, but the storage
> method I use is BDB (not flat files). If I should use this checkppoint
> script, how often should I run it, and should I shut down the qmaster to run
>> On Fri, May 18, 2012 at 1:17 PM, Simon Matthews
>> <simon.d.matthews at gmail.com> wrote:
>> > After SGE was killed by the OOM killed, the file (a berkely db file) in
>> > my
>> > cluster was 1.4GB. I did a db_dump and db_load, on this file, resulting
>> > in a
>> > much smaller file.
>> > However, this then raised the question -- how is this file maintained?
>> > Presumably, it holds the information on jobs in all states (queued,
>> > running
>> > and finished). How do the finished jobs get removed from this file?
>> > Obviously, I don't want the file to grow without limit.
>> > We are now putting about 50k jobs into our small cluster every day (many
>> > finish running in a fraction of a second).
>> > Simon
>> > _______________________________________________
>> > users mailing list
>> > users at gridengine.org
>> > https://gridengine.org/mailman/listinfo/users
More information about the users