3 Pivotal HD 1.1 Stack Binary Package

3.1 Overview

Pivotal HD 1.1 is a full Apache Hadoop distribution with Pivotal add-ons and a native integration with the Pivotal Greenplum database.

PHD 1.1 Stack supports YARN (MR2) resource manager. You can submit Map-Reduce job via the new MapReduce interface.

The RPM distribution of PHD 1.1 contains the following:

HDFS 2.0.5-alpha Pig 0.10.1 Zookeeper 3.4.5 HBase 0.94.8 Hive 0.11.0 Hcatalog 0.11.0 Mahout 0.7 Flume 1.3.1 Sqoop 1.4.2

3.2 Accessing PHD 1.1 Stack Binary Package

You can download the PHD 1.1 Stack Binary Packages from EMC Download Center.

This is a single tar.gz file containing all the components: PHD-1.1.x.0-bin-xx.tar.gz. ( Here "x" denotes a digital number )

The content of this tar file looks like this:

PHD-1.1.0.0-bin-25/zookeeper/tar/zookeeper-3.4.5-gphd-2.1.0.0.tar.gz PHD-1.1.0.0-bin-25/oozie/tar/oozie-3.3.2-gphd-2.1.0.0-distro.tar.gz PHD-1.1.0.0-bin-25/hive/tar/hive-0.11.0-gphd-2.1.0.0.tar.gz PHD-1.1.0.0-bin-25/flume/tar/apache-flume-1.3.1-gphd-2.1.0.0-bin.tar.gz PHD-1.1.0.0-bin-25/pig/tar/pig-0.10.1-gphd-2.1.0.0.tar.gz PHD-1.1.0.0-bin-25/hadoop/tar/hadoop-2.0.5-alpha-gphd-2.1.0.0.tar.gz PHD-1.1.0.0-bin-25/mahout/tar/mahout-distribution-0.7-gphd-2.1.0.0.tar.gz PHD-1.1.0.0-bin-25/sqoop/tar/sqoop-1.4.2-gphd-2.1.0.0.bin__hadoop-2.0.5-alpha-gphd-2.1.0.0.tar.gzPHD-1.1.0.0-bin-25/hbase/tar/hbase-0.94.8-gphd-2.1.0.0.tar.gz

Note: md5 files are not listed here.

Here's the PHD version number for each components in this package: 0.11.0-gphd-2.1.0.0

Component PHD Version Version Placeholder

ZooKeeper 3.4.5-gphd-2.1.0.0 <PHD_ZOOKEEPER_VERSION> Hadoop 2.0.5-alpha-gphd-2.1.0.0 <PHD_HADOOP_VERSION> HBase 0.94.8-gphd-2.1.0.0 <PHD_HBASE_VERSION> Hive 0.11.0-gphd-2.1.0.0 <PHD_HIVE_VERSION> HCatalog 0.11.0-gphd-2.1.0.0 <PHD_HCATALOG_VERSION> Pig 0.10.1-gphd-2.1.0.0 <PHD_PIG_VERSION> Mahout 0.7-gphd-2.1.0.0 <PHD_MAHOUT_VERSION> Flume 1.3.1-gphd-2.1.0.0 <PHD_FLUME_VERSION> Sqoop 1.4.2-gphd-2.1.0.0.bin__hadoop-2.0.5-alpha-gphd-2.1.0.0 <PHD_SQOOP_VERSION> Oozie 3.3.2-gphd-2.1.0.0 <PHD_OOZIE_VERSION>

In the sections below, we will use the values in the "Version Placeholder" to replace the actual PHD Version. When installing, please replace it back to the actual version value.

3.3 Installation

This section provides instructions for installing and running the Pivotal HD 1.1 components from the downloaded binary tarball files.

1. 2.

The installation instructions provided here are intended only as a Quick Start guide that will start the services on one single host. Refer to Apache Hadoop documentation for information about other installation

configurations. http://hadoop.apache.org/docs/r2.0.5-alpha/

PHD should not be installed on the same cluster.

All packages used during this process should come from same PHD distribution tarball, do not mix using package from different tarballs.

3.3.1 Prerequisites

Follow the instructions below to install the Hadoop components (cluster install):

If not created already, add a new user hadoopand switch to that user. All packages should be installed by user hadoop .

$ useradd hadoop $ passwd hadoop $ su - hadoop

Make sure Oracle Java Run-time (JRE) 1.7 is installed on the system and set environment variable to point to the directory where JRE is installed. Appending the following script snippet to

JAVA_HOME

the file ~/.bashrc:

~/.bashrc

export JAVA_HOME=/usr/java/default

Make sure the ~/.bashrc file take effect:

$ source ~/.bashrc

SSH (both client and server) command is required. Set up password-less SSH login according to the following commands.

Password-less SSH login is required to be setup on HDFS name node to each HDFS data node, also on YARN resource manager to each YARN node manager.

Because we are setting up a single node cluster, which means the only machine is the HDFS name node, YARN resource manager, and the only HDFS data node YARN node manager. So the setup is easier.

$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

# Set the permissions on the file on each slave host $ chmod 0600 ~/.ssh/authorized_keys

On a real cluster (distributed), use the following scripts, to setup password-less SSH login, it need to be executed twice, once on HDFS name node, another once on YARN resource manager node, unless you setup HDFS name node and YARN resource manager on same machine. (For your reference only, not needed for this single node cluster installation)

# First login to the master host (YARN resource mananger or HDFS name node). # Replace master@host-master with the real user name and host name of your master host.

$ ssh master@host-master

$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

# copy authorized_keys to each slave hosts (YARN node mananger or HDFS data node) in the cluster using scp

# Replace slave@host-slave with the real user name and host name of your slave host, and do it for each of your slave host.

# NOTE: if an authorized_keys file already exists for# the user, rename your file authorized_keys2

$ scp ~/.ssh/authorized_keys slave@host-slave:~/.ssh/

# Set the permissions on the file on each slave host

# Replace slave@host-slave with the real user name and host name of your slave host, and do it for each of your slave host.

$ ssh slave@host-slave

$ chmod 0600 ~/.ssh/authorized_keys

3.3.2 Hadoop

Unpack the Hadoop tarball file

$ tar zxf hadoop-<PHD_HADOOP_VERSION>.tar.gz

Edit file ~/.bashrc to update environment HADOOP_HOME and HADOOP_HDFS_HOME to be the directory where tarball file is extracted, and add hadoop to file search path.

~/.bashrc

# export HADOOP_HOME, HADOOP_HDFS_HOME export HADOOP_HOME=/path/to/hadoop

Pivotal Product Documentation 3. 4. 1. 2. 3. 4. 5. export HADOOP_HDFS_HOME=/path/to/hadoop export PATH=$HADOOP_HOME/bin:$PATH

And make sure the ~/.bashrc file take effect:

$ source ~/.bashrc

In the sections below, all the shell commands, unless explicitly specified, are run from this .

$HADOOP_HOME

In document Pivotal HD Enterprise (Page 95-99)