[gridengine users] Anyone have scripts for detecting users who bypass grid engine?

Jesse Becker beckerje at mail.nih.gov
Thu Apr 9 20:30:20 UTC 2015

On Thu, Apr 09, 2015 at 09:46:56PM +0200, Reuti wrote:
>Am 09.04.2015 um 21:23 schrieb Chris Dagdigian:
>> I'm one of the people who has been arguing for years that technological methods for stopping abuse of GE systems never work in the long term because motivated users always have more time and interest than overworked admins so it's kind of embarrassing to ask this but ...
>> Does anyone have a script that runs on a node and prints out all the userland processes that are not explicitly a child of a sge_sheperd daemon?
>Why allow `ssh` to a node at all? In my installations only the admins can do this. If users want to peek around on a node I have an interactive queue with a h_cpu limit of 60 seconds for this. So even login in to a node is controlled by SGE.

I agree with Reuti:  why even allow the potential for abuse--accidential
or otherwise?

That said, it's an interesting little problem.  Does this help?

    me at compute-3-23:~$ ./ppid_tree.pl 9309 55990 91608
      pid=9309 cmd=(num_crunch32) ppid=9308
      pid=9308 cmd=(9675988) ppid=9307
      pid=9307 cmd=(sge_shepherd) ppid=79373
    9307 9308 9309
      pid=55990 cmd=(miner) ppid=54911
      pid=54911 cmd=(9718461) ppid=54909
      pid=54909 cmd=(sge_shepherd) ppid=79373
    54909 54911 55990
      pid=91608 cmd=(vim) ppid=91534
      pid=91534 cmd=(bash) ppid=91533
      pid=91533 cmd=(sshd) ppid=91528
      pid=91528 cmd=(sshd) ppid=78863
      pid=78863 cmd=(sshd) ppid=1
    1 78863 91528 91533 91534 91608
    Process 91608 is not a child of a 'sge_shepherd'!

Proceses 9309 and 55990 are legitimate SGE processes (one is even
multi-threaded).  The third process, 91608 is a vim process running to
edit the perl script, and certainly *not* part of SGE.

There's a simple data structure returned called "@tree" (a mis-nomer,
since it's a list...).  It is a list of processes, starting with init,
or sge_shepard, and working down to the PID in question.  If the first
element is "1" (init), you know you've found a process outside of SGE.
If the first element is not "1", then it shoudl be the PID for the
corresponding sge_shepherd.

This should work on any Linux system that has /proc mounted.  Other
systems won't work (although you should just need to munge get_ppid()

Warning!  Ugly Perl ahead!



use strict;
use warnings;

my $parent_process = 'sge_shepherd';

if (!@ARGV) {
    print STDERR "Please enter 1 or more PIDs to check";
    exit 1;

sub get_ppid {
    my ($pid) = @_ ;
    my $stat_file = "/proc/$pid/stat";

    # pid, exe_name, ppid, pgrp, session, tty_nr
    open my $status, '<', $stat_file or die "Failed to open $stat_file: $!";
    my $line = <$status>;
    close $status;
    my (undef, $exec, $state, $ppid) = split(' ', $line);
    print STDERR "  pid=$pid cmd=$exec ppid=$ppid\n";

    return ($ppid, $exec);

sub get_ps_tree {
    my ($pid) = @_;
    my @tree = ($pid);

    my ($ppid, $exec) = get_ppid($pid);

    return @tree if !defined $ppid;

    if ($ppid == 1) {
        unshift @tree, $ppid;
    } elsif ($exec !~ /\(?$parent_process\)?/) {
        unshift @tree, get_ps_tree($ppid);

    return @tree;

foreach my $pid (@ARGV) {
    my @tree = get_ps_tree($pid);
    print "@tree\n";
    if ($tree[0] == 1) {
        print "  Process $pid is not a child of a '$parent_process'!\n";


Jesse Becker (Contractor)

More information about the users mailing list