[gridengine users] Huge amount of files generated in local disk

Reuti reuti at staff.uni-marburg.de
Mon Jan 26 17:04:56 UTC 2015


Hi,

> Am 26.01.2015 um 17:15 schrieb Feng Zhang <prod.feng at gmail.com>:
> 
> I just found a strange behavior of SGE 2011.
> 
> One user's job generate 1+ million small files in local
> disk($TEMPDIR).

Hence in the local scratch directory provided by SGE?


> It looks like it makes the execd very busy and from
> the side of qmaster, the node is lost and unavailable, while I can ssh
> to login. On the node, execd makes huge IOs( a few hundred KB/s to a
> few MB/s). Some nodes can survive and get back to normal, some nodes
> failed at the end(Since this kind of jobs also use a lot of memory, so
> it looks like these nodes failed while the RAM got used up).

Can you spot any oom-killer in the messages file of the node?


> I am
> wondering that whether the execd handles the files that a job
> generates?

Not that I'm aware of. It will just remove the generated directory after the job.

Is it intended by the user to generate this high number of files? It could be limited with a set disk quota.

-- Reuti


> Or execd does something else to communicate with qmaster
> while there are a lot of job generated files?
> 
> 
> -- 
> Best,
> 
> Feng
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users





More information about the users mailing list