[gridengine users] Huge amount of files generated in local disk

Feng Zhang prod.feng at gmail.com
Wed Jan 28 15:40:12 UTC 2015


Thanks, Reuti!

Yes, the user intended to generate those millions files on the  local
scratch directory provided by SGE.

> Can you spot any oom-killer in the messages file of the node?

No, I did not find any useful info. from the log files.

The user happened to run 10 jobs on a node, and these jobs generated
10+million files. It took very long time to even "ls" these files. It
looks like it takes very long time for execd to remove these files,
since I noticed huge IO by execd after I had deleted these jobs.


On Mon, Jan 26, 2015 at 12:04 PM, Reuti <reuti at staff.uni-marburg.de> wrote:
> Hi,
>
>> Am 26.01.2015 um 17:15 schrieb Feng Zhang <prod.feng at gmail.com>:
>>
>> I just found a strange behavior of SGE 2011.
>>
>> One user's job generate 1+ million small files in local
>> disk($TEMPDIR).
>
> Hence in the local scratch directory provided by SGE?
>
>
>> It looks like it makes the execd very busy and from
>> the side of qmaster, the node is lost and unavailable, while I can ssh
>> to login. On the node, execd makes huge IOs( a few hundred KB/s to a
>> few MB/s). Some nodes can survive and get back to normal, some nodes
>> failed at the end(Since this kind of jobs also use a lot of memory, so
>> it looks like these nodes failed while the RAM got used up).
>
> Can you spot any oom-killer in the messages file of the node?
>
>
>> I am
>> wondering that whether the execd handles the files that a job
>> generates?
>
> Not that I'm aware of. It will just remove the generated directory after the job.
>
> Is it intended by the user to generate this high number of files? It could be limited with a set disk quota.
>
> -- Reuti
>
>
>> Or execd does something else to communicate with qmaster
>> while there are a lot of job generated files?
>>
>>
>> --
>> Best,
>>
>> Feng
>> _______________________________________________
>> users mailing list
>> users at gridengine.org
>> https://gridengine.org/mailman/listinfo/users
>



-- 
Best,

Feng



More information about the users mailing list