[gridengine users] Clean up old jobs/spooldb?

Simon Matthews simon.d.matthews at gmail.com
Wed May 2 03:22:17 UTC 2018


After deleting all the jobs, it still won't schedule any new jobs. The
qmaster "messages" file has this in it:

05/01/2018 20:19:51|worker|sgemasterU5|E|scheduler tries to schedule
job 4409436.1 twice
05/01/2018 20:19:51|worker|sgemasterU5|W|Skipping remaining 156 orders

But I can't delete 4409436:
qdel -u build -j 4409436
The job -j of user(s) build does not exist

Simon

On Tue, May 1, 2018 at 2:53 PM, Simon Matthews
<simon.d.matthews at gmail.com> wrote:
> A little more info.
>
> After moving the spool directory and re-starting the qmaster, I
> deleted all the jobs with qdel and the qmaster showed no jobs.
> However, after a further re-start it now shows over 2396 active jobs,
> with "-2251" available slots. I assume that somehow the history of
> jobs finishing was lost so the qmaster thinks the jobs are still
> active. I am trying another delete!
>
> Simon
>
> On Tue, May 1, 2018 at 2:36 PM, Simon Matthews
> <simon.d.matthews at gmail.com> wrote:
>> I am running SoGE 8.1.8, using BDB spooling.
>>
>> Last night the spool directory ran out of disk space (I think),
>> causing a freeze of all jobs.  I moved the spool directory (~4GB at
>> that time) to another partition, with more space.
>>
>> However, jobs are still not running. The qmaster appears to be running
>> and I think reading from the spool directory.
>>
>> I would like to clean out all the jobs (old and current) and start again.
>>
>> Is there a safe way to clean out the spool directory when using BDB
>> spooling? I was not able to backup the configuration because SoGE
>> doesn't provide a copy of db_dump and versions of this program from
>> other distributions fail with an error relating to the version of the
>> database.
>>
>> Any suggestions?
>>
>> Simon



More information about the users mailing list