[gridengine users] Hadoop Integration HOWTO (was: Hadoop Integration - how's it going)

Rayson Ho rayson at scalablelogic.com
Fri Jun 1 19:04:25 UTC 2012


Thanks again Prakashan for the contribution!

Rayson



On Fri, Jun 1, 2012 at 1:25 PM, Prakashan Korambath <ppk at ats.ucla.edu> wrote:
> Thank you Rayson!  Appreciate you taking time and upload the tar files and
> writing the howto.
>
> Regards,
>
> Prakashan
>
>
>
> On 06/01/2012 10:19 AM, Rayson Ho wrote:
>>
>> I've reviewed the integration, and wrote a short Grid Engine Hadoop HOWTO:
>>
>> http://gridscheduler.sourceforge.net/howto/GridEngineHadoop.html
>>
>> The difference between the 2 methods (original SGE 6.2u5 vs
>> Prakashan's) is that with Prakashan's approach, Grid Engine is used
>> for resource allocation, and the Hadoop job scheduler/Job Tracker is
>> used to handle all the MapReduce operations. A Hadoop cluster is
>> created on demand with Prakashan's approach, but in the original SGE
>> 6.2u5 method Grid Engine replaces the Hadoop job scheduler.
>>
>> As standard Grid Engine PEs are used in this new approach, one can
>> call "qrsh -inherit" and use Grid Engine's method to start Hadoop
>> services on remote nodes, and thus get full job control, job
>> accounting, and cleanup at terminate benefits like any other tight PE
>> jobs!
>>
>> Rayson
>>
>>
>>
>> On Tue, May 29, 2012 at 10:36 AM, Prakashan Korambath<ppk at ats.ucla.edu>
>>  wrote:
>>>
>>> I put my scripts in a tar file and send it to Rayson yesterday so that he
>>> can put it in a common place to download.
>>>
>>> Prakashan
>>>
>>>
>>>
>>> On 05/29/2012 07:18 AM, Jesse Becker wrote:
>>>>
>>>>
>>>> On Mon, May 28, 2012 at 12:00:24PM -0400, Prakashan
>>>> Korambath wrote:
>>>>>
>>>>>
>>>>>
>>>>> This is how we run hadoop using Grid Engine (for that matter
>>>>> any scheduler with appropriate alteration)
>>>>>
>>>>> http://www.ats.ucla.edu/clusters/hoffman2/hadoop/default.htm
>>>>>
>>>>> Basically, run either a prolog or call a script inside the
>>>>> submission command file itself to parse the output of
>>>>> PE_HOSTFILE to create hadoop *.site.xml, masters and slaves
>>>>> files at run time. This methodology is suitable for any
>>>>> scheduler as it is not dependent on them. If there is
>>>>> interest I can post the prologue script. Thanks.
>>>>
>>>>
>>>>
>>>> Please do.
>>>>
>>>
>




More information about the users mailing list