Sunday 16 October 2011

Hadoop on Windows 7 In Local Mode

Introduction:
Map Reduce is a programming model which can be implemented in parallel programming. Google introduce this model to market and it is being used by Google since 2004. This model helps us to optimize processing of tera bytes of data. Hadoop is java based open source library to implement this model. Slides are attached for more information about Map Reduce programming model.
http://www.slideshare.net/MuhammadShahid18/map-reduce-by-usman
Below are the details about the implementation of Hadoop on windows 7(Implementation of Map Reduce in Windows).
Sofwtares Required:
First of all we should know that we are going to learn how to implement Map Reduce using an open source library Hadoop .We will run Hadoop example for single machine(Standalone Mode) not in clustered environment.
We need software's to install first. All the below software's can be downloaded from their sites that can be easily searched through Google.
1) Cygwin : It will provide us Linux environment in windows.
2) Vim Editor: It will be used if we need to modify any file. if we modify any file with word pad or any window's based text editor it will create errors for us because they store text in a format which is not according to Linux. 
3) Hadoop Library: Download Library we will use Hadoop-0.20.2 another latest version is also available for us but the thing is that version creates some errors while running on windows.
4) Java SDK: Install JDK 6.0 on your system.
Running an Example:
After the installation simply open the Cygwin console by a double click on its icon. you will see a window like
Now unzip the Hadoop library folder on a specific path.
Another thing is istall your JDK in a folder which is directly in c drive because if you install it in "program files" (Default installation path) it will create errors because "program files" text has space between it and in Linux environment it is not understandable. So i suggest to use the same path as below.
Now Set the JAVA_HOME through this console window write the below line to do this.
export JAVA_HOME=/cygdrive/c/JavaT/jdk1.6.0_02
Now write the below line for ease of running examples.
you have to give the Hadoop path where it is extracted but the highlighted text will be before that path. Again i will suggest you to use the same path for extraction.
cd /cygdrive/c/cygwin/home/usman.shahid/hadoop-0.20.2
After that make an input folder in Hadoop extracted library (Root Folder of hadoop).Copy any text file which has some text in that input folder. Then you will write the below line in console of Cygwin. Be sure that You have not created output folder otherwise you will get exception.
bin/hadoop jar hadoop-examples-0.20.2.jar wordcount input output
After that see the root folder where input folder has been created their will be an output folder in the same directory this folder is created by Hadoop and the resultant file will be created in it.