Practical_10_Hadoop in windows.pdf

(1)

Practical: - 10

AIM

Implementation of Hadoop in Windows

GANPAT UNIVERSITY

U. V. Patel College of Engineering

(2)

Introduction

Hadoop is a powerful framework that allows for automatic parallelezation of computing task. Unfortunately programming for it poses certain challenges, namely it is really hard to understand and debug Hadoop programs. One way to easy things a little bit is to have a simplified version of the hadoop cluster that could run locally on the developer's machine. This tutorial describes how to set-up such cluster on the computer running Microsoft Windows, also it describes how to integrate this cluster with the Eclipse development environment. Eclipse is a prime environment for Java development.

After you made sure that the above prerequisites are installed the next step would be to install the cygwin environment. The cygwin is a set of UNIX packages ported to Microsoft Windows. It is needed to run Hadoop supplied scripts since they are all written for the UNIX platform.

Step:1- download cygwin and jdk.

(3)

(4)

Step:3- Configure ssh daemon

(5)

Step:4- Start SSH daemon

(6)

Step:5- Now we’ll create ssh key for user account

ssh-user-config

Shall I create a SSH1 RSA identity file for you? : no

Shall I create a SSH2 RSA identity file for you? : yes

Do you want to use this identity to login to this machine? (yes/no) - yes

Step:6- Set Environment Variable for Cygwin and Java:

Add new System Variable JAVA_HOME and add the installed JAVA path C:\JAVA

Append the bin folder path of Installed Cygwin C:\cygwin64\bin

Step:7-Now check if the keys where set-up correctly by executing the following command

ssh –v localhost

Step:8- Configure Hadoop

tar -xzf hadoop-0.19.1.tar.gz

cd hadoop-0.19.1

cd conf

(7)

step:9-

Create a folder with name

"hadoop-dir"

. And inside

"hadoop-dir"

folder create 2

folder with names

"datadir"

and

"namedir"

.

Step:10- In Cygwin execute chmod command to change folder permissions so that it will be

accesses by Hadoop.

$ chmod 755 hadoop-dir cd hadoop-dir

$ chmod 755 datadir $ chmod 755 namedir

Step:11- update hadoop-site.xml file

(8)

Step:12- Next step is to format the namenode, to create a Hadoop distributed file system

(HDFS

).

cd hadoop-0.19.1

mkdir logs

bin/hadoop namenode -format

Step:13- Restart the Cygwin Terminal and execute below command to start all daemons on

Hadoop Cluster.

$ bin/start-all.sh

Stop Hadoop Daemons:

To stop all the daemons, we can execute the command

$ bin/stop-all.sh

Step:14- The next step is to install and check the Hadoop plugin for eclipse.

cd hadoop-0.19.1 cd contrib

(9)

Step:15- Navigate to Hadoop eclipse plugin folder and copy jar file and paste into

eclipse plugin

1. Shrink the newly popped window and move it to the right side of the screen.

2. Open another explorer window, either through "My Computer" icon or by using the "Start -> Run" menu. Navigate to your Eclipse installation and then open the "plugin" folder of your Eclipse installation.

3. Copy the file "hadoop-0.19.1-eclipse-plugin.jar, from the Hadoop eclipse plugin folder to the Eclipse plugins folder. As shown on the figure below.

4. Close both explorer windows

5. Start Eclipse

6. Click on the open perspective icon , which is usually located in the upper-right corner the eclipse application. Then select other from the menu.

7. Select Map/Reduce from the list of perspectives and press "OK" button.

As a result you IDE should open a new perspective that looks similar to the image below.

Step:16- Start the local hadoop cluster

Next step is to launch your newly configured cluster.

(10)

1. Start the namenode in the first window by executing cd hadoop-0.19.1

bin/hadoop namenode

2. Start the secondary namenode in the second window by executing cd hadoop-0.19.1

bin/hadoop secondarynamenode

3. Start the job tracker the third window by executing cd hadoop-0.19.1

bin/hadoop jobtracker

4. Start the data node the fourth window by executing cd hadoop-0.19.1

bin/hadoop datanode

5. Start the task tracker the fifth window by executing cd hadoop-0.19.1

bin/hadoop tasktracker

(11)

Step:17- Setup Hadoop Location in Eclipse

Next step is to configure Hadoop location in the Eclipse environment. 1. Launch the Eclipse environment.

2. Open Map/Reduce perspective by clicking on the open perspective icon ( ), select "Other" from the menu, and then select "Map/Reduce" from the list of perspectives.

3. After you switched to the Map/Reduce perspective. Select the Map/Reduce

(12)

(13)

Step:18- Upload Files To HDFS

cd hadoop-0.19.1

bin/hadoopfs -mkdir In

bin/hadoopfs -put *.txt In

When the last of the above commands will start execution you should see some activity happening in the rest of the hadoop windows as shown on the image below.

Step:19- Create and run Hadoop project

Now we are ready to create and run out first Hadoop project.

Creating and configuring Hadoop eclipse project.

1. Launch Eclipse

2. Right click on the blank space in the ProjectExplorer window and select New -> Project.. to create a new project.

Select Map/Reduce Project from the list of project types. As shown on the image below.

(14)

You will see the project properties window similar to the one shown below

3. Fill in the project name and then click on Configure Hadoop Installation link. Which is located on the right side of the project configuration window. This will bring up

(15)

Step:20- Creating Map/Reduce driver class

1. Right click on the newly created Hadoop project in the Project Explorer tab and select New->other from the context menu.

2. Go to Map/Reduce folder, select MapReduceDriver then press the Nextbutton. As shown on the image below.

(16)

4. Unfortunately the Hadoop plugin for eclipse is slightly out of step with the recent Hadoop API, so we need to edit the driver code a bit.

Find two following lines in the source code and comment them out:

conf.setInputPath(new Path("src")); conf.setOutputPath(new Path("out"));

Enter the following code right immediately below the two lines you just commented out.

conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path("In")); FileOutputFormat.setOutputPath(conf, new Path("Out"));

As shown on the image below

5. After you have changed the code, you will see the new lines marked as incorrect by the Eclipse. Click on the the error icon for each of the line and select Eclipse's suggestion to import the missing class.

You need to import the following classes TextInputFormat, TextOutputFormat, FileInputFormat, FileOutputFormat.

(17)

Step:21- Running Hadoop Project

1. Right click on the TestDriver class in the Project Explorer tab and select Run As --> JAVA Application. This will bring up the windows like the one shown below.

Hadoop

Eclipse

Java

cygwin