[gridengine users] Problems with dmtcp migration between processor versions

Orion Poplawski orion at cora.nwra.com
Tue Oct 16 22:24:03 UTC 2012


On 10/16/2012 12:20 PM, Reuti wrote:
> Am 16.10.2012 um 19:50 schrieb Orion Poplawski:
>
>> With some more testing I'm seeing one major issue with dmtcp migration that involves migrating between different processor versions (e.g. Xeon 5400 -> 5500).  We're running code compiled with the Intel Fortran compiler that is compiled with different code paths for different processors.  This appears to be detected once at startup because if a job is migrated from an older processor to a newer processor the job will die with an illegal instruction signal.
>
> I would assume that the compiled application detects only once which CPU type it's running on. You could limit this during compilation I would assume to compile only for 5400 or older.

Yes, that is one possibility.

>> There does not appear to be a way to restrict the migration of a job beyond what was already specified in the job submission, correct?  I wonder if it would be possible to put more restrictions on a migrating job somehow.
>
> You can use `qalter` in the migration method to request a hostgroup or requesting a string for the CPU type for the job before it's being killed.

I've started going down this path, but unfortunately qalter only works on jobs 
and not on tasks, so I don't think it is going to be very useful for us as we 
are very heavy task users.

But I've put some notes on it in my dmtcp readme in case it is useful for 
others.  We'll probably make use of the cpuflag_* booleans elsewhere.

https://github.com/opoplawski/gridengine_dmtcp

-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA, Boulder Office                  FAX: 303-415-9702
3380 Mitchell Lane                       orion at nwra.com
Boulder, CO 80301                   http://www.nwra.com



More information about the users mailing list