Hadoop Multi-node Cluster Installation on Centos6.6

(1)

Hadoop Multi-node Cluster

Installation

on Centos6.6

Created: 01-12-2015 Author: Hyun Kim Last Updated: 01-12-2015 Version Number: 0.1 Contact info: [email protected] [email protected]

(2)

Hadoop Multi Cluster Installation Guide with Centos 6

In this tutorial, we are using Centos 6.6 and we are going to install multi node cluster Hadoop.

For this tutorial, we need at least two nodes. One of them is going to be a master node and the other node is going to be a slave node. I’m only using two nodes in this tutorial to make this guide as simple as possible. We will be installing namenode and jobtracker on the master node and installing datanode, tasktracker, and

secondarynamenode on the slave node. I’m using hostname for my masternoe as lbb01.exmaple.com and slavenode as lbb02.example.com. Simple enough? Let’s get started.

Static IP Configuration

We want our servers to work all the time even when they restart by accident. Therefore, we will configure static ip for each server. Use the command below to open ethernet configuration.

You connection might be eth0 instead of em1. $nano /etc/sysconfig/network-scripts/ifcfg-em1

Change BOOTPROTO = “static” and add your IPADDR and NETMASK.

You can check your ip and netmask address by using “ifconfig” command. As an exmaple:

IPADDR=”192.168.23.234” NETMASK=”255.255.255.0”

(3)

Configure Default Gateway

$ nano /etc/sysconfig/network

Now we are trying to configure network. This may sound complicated but we are simply add HOSTNAME and GATEWAY. If GATEWAY or HOSTNAME exists already, simply edit them.

I’m using lbb01.exmaple.com as my hostname as you can see in the picture below.

Add your GATEWAY=XXX.XXX.XXX.X

Restart network

$etc/init.d/network restart Configure DNS

$ nano /etc/resolv.conf

add your primary and alternative nameserver. For example,

nameserver xxx.xxx.xxx.x nameserver xxx.xxx.xxx.x

$ install yum

(4)

Download JDK

We need JDK to install Hadoop. I’m installing jdk-7u25 in this tutorial. ww.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-javase7-521261.html#jdk-7u25-oth-JPR

(5)

Download hadoop

We are installing hadoop-0.20.0 in this tutorial. Hadoop-0.20.0 Donwload

(6)

I saved the file under root folder. Ping localhost

Do what we’ve done so far on slave node as well. Do change host name to lbb02.exmaple.com NOT lbb01.example.com. Each node has different

IPADDR(ip address) so use command “ifconfig” to adjust all the settings. edit /etc/hosts

on each node edit the hosts file. $nano /etc/hosts

add

XXX.XXX.XXX.XXX(ip address for your master node) lbb01.example.com(hostname for your master node)

(7)

XXX.XXX.XXX.XXX(ip address for your slave node) lbb02.example.com(hostname for your master node)

Try to ping each host to see if they can communicate with each other. You should be able to ping each host by hostname now.

On each node, $ping lbb01.example.com $ping lbb02.exmaple.com nslookup $ nslookup lbb01.example.com $ nslookup lbb02.example.com

If these commands outputs server, address, name on each node, we have successfully configured network settings.

Install hadoop

As you can see, I’m logged in as a root user. However, I’m not going to extract hadoop as a root user. I will be moving the hadoop file to

/home/lbbd/ since that is where I can write the file under the user name “lbbd”.

Your user/account name will be different. Be aware.

Giving lbbd permission

Although the hadoop file is extracted under /home/lbbd/, we need to give lbbd permission to play wit this folder. To do this, use the command below.

(8)

$ chown -R lbbd:lbbd /home/lbbd/hadoop-0.20.0

Change hadoop-0.20.0 to hadoop

$ ln -s hadoop-0.20.0 hadoop Why change to hadoop?

So that whenever we need to edit something on hadoop-0.20.0 folder, we don’t have to type -0.20.0 anymore. We can simply go to hadoop-0.20.0 folder by $ cd /home/lbbd/hadoop. It’s convenient.

Install JDK

I saved the jdk-7u25 file on /root/hadoop_packages. You didn’t have to do this. Wherever you saved your jdk file, go to the folder. use the command below to extract the file.

$ rpm -ivh hadoop_pcakges/jdk-7u25-linux-x64.rpm

(9)

$nano /home/lbbd/hadoop/conf/hadoop-env.sh

Now we need to change hadoop-env since we need to let hadoop related files to know where we we extracted jdk and hadoop.

so I added two lines below:

export JAVA_HOME=/usr/java/jdk1.7.0_25/ export HADOOP_HOME=/home/lbbd/hadoop

core-site.xml edit

$nano /home/lbbd/hadoop/conf/core-site.xml Edit the file by adding

<property>

<name>fs.default.name</name>

<value>hdfs://(your host anme):9000</value> </property>

(10)

hdfs-site.xml <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.name.dir</name> <value>/var/datastore</value> <final>true</final> </property>

(11)

Don’t forget to give you account permission to /var/datastore. Namenode cannot run without permission.

So login as root and create the folder shown above $ mkdir /var/datastore

then give the user permission to access to the folder $ chown -R lbbd:lbbd /var/datastore

use to below command to see if the permission has been updated $ls -l /var/ mapred-site.xml <property> <name>mapred.job.tracker</name> <value>hostname:9001</value> </property>

(12)

edit .bash_profile

(13)

run these commands below to see if everything is installed and directed correctly in the system

$java

$hadoop

(14)

Format Namenode

(15)

$ hadoop-daemon.sh start namenode $ jps

jobtracker running

$ hadoop-daemon.sh start jobtracker $ jps

Do all the followings above on your slave node as well. However, when you edit hdfs.xml file use the properties below:

(16)

And then you need to create data folder by $mkdir /home/data (as root user) and give your user account permission to this folder as we did with /var/datastore folder.