Single-Node Installation - For your convenience Apress has placed some of the front matter mate

From this point on, you will be carrying out a single-node Hadoop installation (until you format the Hadoop file system on this node). First, you ftp the file hadoop-1.2.1.tar.gz to all of your nodes and carry out the steps in this section on all nodes.

So, given that you are logged in as the user hadoop, you see the following file in the $HOME/Downloads directory:

[hadoop@hc1nn Downloads]$ ls -l total 62356

-rw-rw-r--. 1 hadoop hadoop 63851630 Mar 15 15:01 hadoop-1.2.1.tar.gz

This is a gzipped tar file containing the Hadoop 1.2.1 software that you are interested in. Use the Linux gunzip tool to unpack the gzipped archive:

[hadoop@hc1nn Downloads]$ gunzip hadoop-1.2.1.tar.gz [hadoop@hc1nn Downloads]$ ls -l

total 202992

-rw-rw-r--. 1 hadoop hadoop 207861760 Mar 15 15:01 hadoop-1.2.1.tar Then, unpack the tar file:

[hadoop@hc1nn Downloads]$ tar xvf hadoop-1.2.1.tar [hadoop@hc1nn Downloads]$ ls -l

total 202996

drwxr-xr-x. 15 hadoop hadoop 4096 Jul 23 2013 hadoop-1.2.1 -rw-rw-r--. 1 hadoop hadoop 207861760 Mar 15 15:01 hadoop-1.2.1.tar

Now that the software is unpacked to the local directory hadoop-1.2.1, you move it into a better location. To do this, you will need to be logged in as root:

[hadoop@hc1nn Downloads]$ su -Password:

[root@hc1nn ~]# cd /home/hadoop/Downloads

[root@hc1nn Downloads]# mv hadoop-1.2.1 /usr/local [root@hc1nn Downloads]# cd /usr/local

You have now moved the installation to /usr/local, but make sure that the hadoop user owns the installation.

Use the Linux chown command to recursively change the ownership and group membership for files and directories within the installation:

[root@hc1nn local]# chown -R hadoop:hadoop hadoop-1.2.1 [root@hc1nn local]# ls -l

total 40

drwxr-xr-x. 15 hadoop hadoop 4096 Jul 23 2013 hadoop-1.2.1

You can see from the last line in the output above that the directory is now owned by hadoop and is a member of the hadoop group.

You also create a symbolic link to refer to your installation so that you can have multiple installations on the same host for testing purposes:

[root@hc1nn local]# ln -s hadoop-1.2.1 hadoop [root@hc1nn local]# ls -l

lrwxrwxrwx. 1 root root 12 Mar 15 15:11 hadoop -> hadoop-1.2.1 drwxr-xr-x. 15 hadoop hadoop 4096 Jul 23 2013 hadoop-1.2.1

The last two lines show that there is a symbolic link called hadoop under the directory /usr/local that points to our hadoop-1.2.1 installation directory at the same level. If you later upgrade and install a new version of the Hadoop V1 software, you can just change this link to point to it. Your environment and scripts can then remain static and always use the path /usr/local/hadoop.

Now, you follow these steps to proceed with installation.

1. Set up Bash shell file for hadoop $HOME/.bashrc

When logged in as hadoop, you add the following text to the end of the file $HOME/.bashrc. When you create this Bash shell, environmental variables like JAVA_HOME and HADOOP_PREFIX are set. The next time a Bash shell is created by the hadoop user account, these variables will be pre-defined.

#######################################################

# Set Hadoop related env variables export HADOOP_PREFIX=/usr/local/hadoop

# set JAVA_HOME (we will also set a hadoop specific value later) export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk

# some handy aliases and functions unalias fs 2>/dev/null

alias fs="hadoop fs"

unalias hls 2>/dev/null alias hls="fs -l"

# add hadoop to the path

export PATH=$PATH:$HADOOP_PREFIX export PATH=$PATH:$HADOOP_PREFIX/bin export PATH=$PATH:$HADOOP_PREFIX/sbin

Note that you are not using the $HADOOP_HOME variable, because with this release it has been superseded. If you use it instead of $HADOOP_PREFIX, you will receive warnings.

2. Set up conf/hadoop-env.sh

You now modify the configuration file hadoop-env.sh to specify the location of the Java installation by setting the JAVA_HOME variable. In the file conf/hadoop-env.sh, you change:

# export JAVA_HOME=/usr/lib/j2sdk1.5-sun to

export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk

Note: When referring to the Hadoop installation configuration directory in this section, and all subsequent sections for the V1 installation, I mean the /usr/local/hadoop/conf directory.

3. Create Hadoop temporary directory

On the Linux file system, you create a Hadoop temporary directory, as shown below. This will give Hadoop a working area. Set the ownership to the hadoop user and also set the directory permissions:

[root@hc1nn local]# mkdir -p /app/hadoop/tmp

[root@hc1nn local]# chown -R hadoop:hadoop /app/hadoop [root@hc1nn local]# chmod 750 /app/hadoop/tmp

4. Set up conf/core-site.xml

You set up the configuration for the Hadoop core component. This file configuration is based on XML; it defines the Hadoop temporary directory and default file system access. There are many more options that can be specified; see the Hadoop site (hadoop.apache.org) for details.

Add the following text to the file between the configuration tags:

<name>hadoop.tmp.dir</name>

<value>/app/hadoop/tmp</value>

<description>A base for other temporary directories.</description>

</property>

<name>fs.default.name</name>

<value>hdfs://localhost:54310</value>

<description>The name of the default file system.</description>

</property>

5. Set up conf/mapred-site.xml

Next, you set up the basic configuration for the Map Reduce component, adding the following between the configuration tags. This defines the host and port name for each Job Tracker server.

<name>mapred.job.tracker</name>

<value>localhost:54311</value>

<description>The host and port for the Map Reduce job tracker </description>

</property>

<name>mapred.job.tracker.http.address</name>

<value>localhost:50030</value>

</property>

<name>mapred.task.tracker.http.address</name>

<value>localhost:50060</value>

</property>

The example configuration file here is for the server hc1r1m1. When the configuraton is changed to a cluster, these Job Tracker entries will refer to Name Node machine hc1nn.

6. Set up file conf/hdfs-site.xml

Set up the basic configuration for the HDFS, adding the following between the configuration tags. This defines the replication level for the HDFS; it shows that a single block will be copied twice. It also specifies the address of the Name Node web user interface as dfs.http.address:

<name>dfs.replication</name>

<description>The replication level</description>

</property>

<name>dfs.http.address</name>

<value>http://localhost:50070/</value>

</property>

7. Format the file system

Run the following command as the Hadoop user to format the file system:

hadoop namenode -format

Warning

■ do not execute this command on a running hdfS or you will lose your data!

The output should look like this:

14/03/15 16:08:19 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG: host = hc1nn/192.168.1.107 STARTUP_MSG: args = [-format]

STARTUP_MSG: version = 1.2.1

STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013

STARTUP_MSG: java = 1.6.0_30

************************************************************/

14/03/15 16:08:20 INFO util.GSet: Computing capacity for map BlocksMap 14/03/15 16:08:20 INFO util.GSet: VM type = 32-bit

14/03/15 16:08:20 INFO util.GSet: 2.0% max memory = 1013645312

14/03/15 16:08:20 INFO util.GSet: capacity = 2^22 = 4194304 entries 14/03/15 16:08:20 INFO util.GSet: recommended=4194304, actual=4194304 14/03/15 16:08:20 INFO namenode.FSNamesystem: fsOwner=hadoop

14/03/15 16:08:20 INFO namenode.FSNamesystem: supergroup=supergroup 14/03/15 16:08:20 INFO namenode.FSNamesystem: isPermissionEnabled=true 14/03/15 16:08:20 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100

14/03/15 16:08:20 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)

14/03/15 16:08:20 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0 14/03/15 16:08:20 INFO namenode.NameNode: Caching file names occuring more than 10 times

14/03/15 16:08:20 INFO common.Storage: Image file /app/hadoop/tmp/dfs/name/current/fsimage of size 112 bytes saved in 0 seconds.

14/03/15 16:08:20 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/app/hadoop/tmp/

dfs/name/current/edits

14/03/15 16:08:20 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/app/hadoop/tmp/

dfs/name/current/edits

14/03/15 16:08:21 INFO common.Storage: Storage directory /app/hadoop/tmp/dfs/name has been successfully formatted.

14/03/15 16:08:21 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at hc1nn/192.168.1.107

************************************************************/

Now you test that you can start, check, and stop the Hadoop servers on a standalone node without errors. Start the servers by using: Now, check that the servers are running. Note that you should expect to see the following:

Name node

Running on the master server hc1nn, use the jps command to list the servers that are running:

[hadoop@hc1nn ~]$ jps 2116 SecondaryNameNode 2541 Jps

2331 TaskTracker 2194 JobTracker

1998 DataNode 1878 NameNode

If you find that the jps command is not available, check that it exists as $JAVA_HOME/bin/jps. Ensure that you installed the Java JDK in the previous step. If that does not work, then try installing the Java OpenJDK development package as root:

[root@hc1nn ~]$ yum install java-1.6.0-openjdk-devel

Your result shows that the servers are running. If you need to stop them, use the stop-all.sh command, as follows:

[hadoop@hc1nn ~]$ stop-all.sh stopping jobtracker

localhost: stopping tasktracker stopping namenode

localhost: stopping datanode

localhost: stopping secondarynamenode

You have now completed a single-node Hadoop installation. Next, you repeat the steps for the Hadoop V1 installation on all of the nodes that you plan to use in your Hadoop cluster. When that is done, you can move to the next section, “Setting up the Cluster,” where you’ll combine all of the single-node machines into a Hadoop cluster that’s run from the Name Node machine.

In document For your convenience Apress has placed some of the front matter material after the index. Please use the Bookmarks and Contents at a Glance links to (Page 22-28)