[gridengine users] cannot run in PE "mpich" because it only offers 0 slots

Reuti reuti at staff.uni-marburg.de
Fri Oct 7 13:04:29 UTC 2011


Am 06.10.2011 um 14:40 schrieb Jesse Becker:

> I ran into this a few months ago, and it had almost nothing to do with
> PE slots.  Unfortunately, I can't recall what I did to fix it either.
> Try submitting test jobs with "-w v" and "-w p" to get more of an idea
> of what's going on.

Yes, this needs to be investigated by hand. It's an RFE to get a  
better scheduling output. Like here, you would like to know why the  
slots couldn't be allocated. That there are only zero slots avilable,  
is the result of another limit already.

Could be memory, RQS, slots, ...

-- Reuti

> On Thu, Oct 06, 2011 at 04:39:39AM -0400, wzlu wrote:
>> Dear All,
>>
>> There are 144 nodes in my queue and I configured 1 slot for each  
>> node. That is 144 nodes with 144 slots.
>> The PE is used 121 slots now. One job need 12 PE's slots and there  
>> are enough nodes and slots for this job.
>> But it queued by "cannot run in PE "mpich" because it only offers 0  
>> slots".
>>
>> Configure as following:
>>
>> $ qconf -sp mpich
>> pe_name           mpich
>> slots             81920
>> user_lists        NONE
>> xuser_lists       NONE
>> start_proc_args   /bin/true
>> stop_proc_args    /bin/true
>> allocation_rule   $round_robin
>> control_slaves    TRUE
>> job_is_first_task FALSE
>> urgency_slots     min
>>
>> $ qconf -ssconf
>> algorithm                         default
>> schedule_interval                 0:0:5
>> maxujobs                          0
>> queue_sort_method                 load
>> job_load_adjustments              NONE
>> load_adjustment_decay_time        0:7:30
>> load_formula                      slots
>> schedd_job_info                   true
>> flush_submit_sec                  0
>> flush_finish_sec                  0
>> params                            none
>> reprioritize_interval             0:0:0
>> halftime                          168
>> usage_weight_list                  
>> cpu=1.000000,mem=0.000000,io=0.000000
>> compensation_factor               5.000000
>> weight_user                       0.250000
>> weight_project                    0.250000
>> weight_department                 0.250000
>> weight_job                        0.250000
>> weight_tickets_functional         0
>> weight_tickets_share              0
>> share_override_tickets            TRUE
>> share_functional_shares           TRUE
>> max_functional_jobs_to_schedule   200
>> report_pjob_tickets               TRUE
>> max_pending_tasks_per_job         50
>> halflife_decay_list               none
>> policy_hierarchy                  OFS
>> weight_ticket                     0.010000
>> weight_waiting_time               0.000000
>> weight_deadline                   3600000.000000
>> weight_urgency                    0.100000
>> weight_priority                   1.000000
>> max_reservation                   0
>> default_duration                  00:15:00
>>
>> How to fix this problem. Thanks a lot.
>>
>> Best Regards,
>> Lu
>
>> _______________________________________________
>> users mailing list
>> users at gridengine.org
>> https://gridengine.org/mailman/listinfo/users
>
>
> -- 
> Jesse Becker
> NHGRI Linux support (Digicon Contractor)
> _______________________________________________
> users mailing list
> users at gridengine.org
> https://gridengine.org/mailman/listinfo/users




More information about the users mailing list