• No results found

Platfora Installation Guide

N/A
N/A
Protected

Academic year: 2021

Share "Platfora Installation Guide"

Copied!
131
0
0

Loading.... (view fulltext now)

Full text

(1)

Platfora Installation Guide

Version 5.0

For Amazon EMR Cloud Deployments

Copyright Platfora 2015

(2)

Document Conventions... 5

Contact Platfora Support...6

Copyright Notices... 6

Chapter 1: Installation Overview (Amazon EMR)... 8

Amazon AWS Cloud Deployments... 9

Master vs Worker Node Installations... 9

Preinstall Checklist... 10

High-Level Install Steps... 12

Chapter 2: System Requirements (AWS Cloud)...14

Supported Hadoop and Hive Versions... 14

Platfora EC2 Instance Requirements...15

Amazon EMR Instance Requirements...16

AWS Security Settings for Platfora...17

Amazon AWS Virtual Private Cloud (VPC)... 17

IAM User and IAM Roles for Platfora...18

EC2 Security Group Settings... 23

Port Configuration Requirements...24

Ports to Open on Platfora Nodes... 24

Browser Requirements...25

Chapter 3: Install Platfora Software and Dependencies...26

About the Platfora Installer Packages... 26

Install Using RPM Packages... 27

Install Dependencies RPM Package... 27

Install Optional Security RPM Package...28

Install Platfora RPM Package (Master Only)...30

Install Using the TAR Package...31

Create the Platfora System User... 31

Set OS Kernel Parameters...33

Install Dependent Software...35

Install Platfora TAR Package (Master Only)... 39

Install PDF Dependencies (Master Only)... 40

Chapter 4: Configure Environment on Platfora Nodes...43

Install the MapR Client Software (MapR Only)...43

Configure Network Environment... 45

(3)

Verify Connectivity to Hadoop Nodes...47

Open Firewall Ports... 49

Configure Passwordless SSH... 49

Verify Local SSH Access...49

Exchange SSH Keys (Multi-Node Only)...50

Synchronize the System Clocks... 51

Create Local Storage Directories...51

Verify Environment Variables...52

Chapter 5: Initialize Platfora Master Node... 54

Connect Platfora to Your Hadoop Services...54

Understand How Platfora Connects to Hadoop... 54

Create Local Hadoop Configuration Directory...56

Initialize the Platfora Master... 59

Configure SSL for Client Connections...61

Configure SSL for Catalog Connections... 63

About System Diagnostic Data...64

Configure Platfora for Amazon EMR... 65

Troubleshoot Setup Issues... 69

View the Platfora Log Files... 69

Setup Fails Setting up Catalog Metadata Service...69

TEST FAILED: Checking integrity of binaries... 70

Chapter 6: Start Platfora...72

Start the Platfora Server... 72

Log in to the Platfora Web Application... 73

Add a License Key...75

Change the Default Admin Password...75

Load the Tutorial Data... 76

Chapter 7: Initialize a Worker Node... 78

Appendix A: Command Line Utility Reference...79

setup.py... 79

hadoop-check... 83

hadoopcp... 86

hadoopfs... 87

install-node... 88

platfora-catalog... 89

platfora-catalog ssl...91

platfora-config... 92

platfora-export...94

platfora-import...98

(4)

platfora-license... 101

platfora-license install... 101

platfora-license uninstall... 102

platfora-license view... 103

platfora-node...103

platfora-node add...105

platfora-node config... 106

platfora-services... 107

platfora-services start...108

platfora-services stop...110

platfora-services restart... 111

platfora-services status... 112

platfora-services sync... 114

platfora-syscapture... 115

platfora-syscheck...117

Appendix B: Glossary... 119

(5)

This guide provides information and instructions for installing and initializing a Platfora® cluster. This guide is intended for system administrators with knowledge of Linux/Unix system administration and basic Hadoop administration.

This Amazon Web Services (AWS) cloud installation guide is for organizations that do not have a persistent Hadoop cluster. Instead, your organization uses Amazon S3 for raw data storage and Amazon Elastic MapReduce (EMR) for on-demand Hadoop data processing.

Document Conventions

This documentation uses certain text conventions for language syntax and code examples.

Convention Usage Example

$ Commandline prompt

-proceeds a command to be entered in a command-line terminal session.

$ls

$sudo Command-line prompt for a command that requires root permissions (commands will be prefixed with sudo).

$sudo yum install open-jdk-1.7

UPPERCASE Function names and keywords are shown in all uppercase for readability, but keywords are case-insensitive (can be written in upper or lower case).

SUM(page_views)

italics Italics indicate a user-supplied argument or variable.

SUM(field_name)

[ ] (square brackets)

Square brackets denote optional syntax items.

CONCAT(string_expression[,...]) ...

(elipsis)

An elipsis denotes a syntax item that can be repeated any number of times.

(6)

Contact Platfora Support

For technical support, you can send an email to: [email protected]

Or visit the Platfora support site for the most up-to-date product news, knowledge base articles, and product tips.

http://support.platfora.com

To access the support portal, you must have a valid support agreement with Platfora. Please contact your Platfora sales representative for details about obtaining a valid support agreement or with questions about your account.

Copyright Notices

Copyright © 2012-15 Platfora Corporation. All rights reserved.

Platfora believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” PLATFORA

CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR

PURPOSE.

Use, copying, and distribution of any Platfora software described in this publication requires an

applicable software license. Platfora®, You Should Know, Interest Driven Pipeline, Fractal Cache, and Adaptive Job Synthesis™ are trademarks of the Platfora Corporation. Apache Hadoop and Apache Hive™ are trademarks of the Apache Software Foundation. All other trademarks used herein are the property of their respective owners.

Embedded Software Copyrights and License Agreements

Platfora contains the following open source and third-party proprietary software subject to their respective copyrights and license agreements:

Apache Hive PDKdom4j

freemarkerGeoNamesGoogle Maps API

(7)

Apache POI • javassist • javax.servlet • Mortbay Jetty 6.1.26OWASP CSRFGuard 3PostgreSQL JDBC 9.1-901Scala • sjsxp : 1.0.1 • Unboundid

(8)

1

Installation Overview (Amazon EMR)

This section provides an overview of the Platfora installation process for Amazon AWS cloud environments that will use Amazon Elastic MapReduce (EMR) as their primary Hadoop deployment for Platfora.

Topics:

Amazon AWS Cloud Deployments

Master vs Worker Node Installations

Preinstall Checklist

(9)

Amazon AWS Cloud Deployments

An Amazon Web Services (AWS) cloud deployment means that you do not have a persistent Hadoop cluster. Instead, your organization uses Amazon S3 for raw data storage and Amazon EMR for on-demand Hadoop data processing.

In an Amazon AWS cloud deployment, the Platfora server instances are deployed on dedicated, high-memory EC2 instances. Your organization’s raw data is managed in Amazon's Simple Storage Service (S3). Platfora uses Amazon Elastic MapReduce (EMR) to run its data processing jobs (lens builds). The results of the lens build jobs are then written back to S3.

Master vs Worker Node Installations

If you are installing Platfora for the very first time, you begin by installing, configuring and initializing the Platfora master node. Once you have the master node up and running, you can then add in additional worker nodes as needed.

All nodes in a Platfora cluster (master and workers) must meet the minimum system requirements and have the required prerequisite software installed. If you are using the RPM installer packages, you can use the base installer package to install the required software on each Platfora node. If you are using the TAR installer packages, you must manually install the required software on each Platfora node.

(10)

You only need to install the Platfora server software, however, on the master node. Platfora copies the server software from the master to the worker nodes during the worker node initialization process. All nodes in a Platfora cluster also require you to configure the network environment so that all the nodes can talk to each other, as well as to the Hadoop cluster nodes. If you are adding additional worker nodes to an existing Platfora cluster, make sure to follow the instructions for installing dependencies and configuring the environment. You can skip any tasks denoted as 'Master Only' - these tasks are only required for first-time installations of the Platfora master node.

Preinstall Checklist

Here is a list of items and information you will need in order to install a new Platfora cluster with an Amazon Elastic MapReduce (EMR) cloud deployment. Platfora must be able to connect to various Amazon Web Services (AWS) during setup, so you will also need information about your AWS account.

Platfora Checklist

This is a list of things you will need in order to install Platfora nodes.

What You Need Description

Platfora License Platfora Customer Support must issue you a license file. Trial period licenses are available upon request for pilot installations.

Platfora Software A Platfora customer support representative can give you the download link to the Platfora installation package for your chosen EC2 operating system and Amazon EMR Hadoop version. Platfora provides both rpm and tar installer packages.

(MapR Only) MapR Client Software

If you are using a MapR Hadoop cluster with Platfora, you will need the MapR client software for the version of MapR you are using. The MapR client software must be installed on all Platfora nodes.

(11)

Amazon Web Services Checklist

This is a list of things you will need to create or obtain from your Amazon Web Services (AWS) environment in order to install Platfora.

What You Need Description

AWS VPC Subnet ID Platfora must be able to launch an Amazon EMR cluster in a public subnet in AWS. An Amazon AWS administrator should provision an Amazon VPC with a public subnet. You must ensure the Platfora server can communicate with the subnet in the VPC. If the Platfora server is on the same subnet as the Amazon EMR cluster, this happens automatically.

After the AWS VPC is provisioned, you will need the subnet identifier when configuring the Platfora configuration properties.

IAM User AWS Identity and Access Management (IAM) allows you to create users, groups, and roles to control access to AWS services and resources. Platfora recommends creating an IAM User account specifically for use by Platfora.

This user must have (at a minimum) the permissions specified in IAM User and IAM Roles for Platfora.

AWS Access Key After you have created the Platfora IAM user, download the AWS credentials for this user. You will need the Access Key Id and Secret Access Key when you initialize Platfora for use with Amazon EMR.

(12)

What You Need Description

IAM Roles Amazon requires all AWS users to use IAM Roles to launch EMR clusters. Platfora recommends creating custom IAM Roles specifically for use by Platfora.

Create a role for each of the following EMR cluster services: • Amazon EMR service (service role). In Amazon AWS,

create a custom IAM Role and attach a security policy that contains at a minimum the permissions specified in IAM User and IAM Roles for Platfora. The custom role you define corresponds to the default IAM Role Amazon offers called EMR_DefaultRole.

EC2 instances (instance profile) in the Amazon EMR cluster. n Amazon AWS, create a custom IAM Role and attach a security policy that contains at a minimum the permissions specified in IAM User and IAM Roles for Platfora. The custom role you define corresponds to the default IAM Role Amazon offers called EMR_EC2_DefaultRole.

You need the role names when configuring the Platfora configuration properties.

EC2 Security Group EC2 security groups allow you to specify firewall rules for your Amazon elastic cloud computing (EC2) server instances. You should create a set of Security Group rules to apply to your Platfora instances.

EC2 Instances You will need to launch the EC2 instances on which to install the Platfora master and worker servers.

S3 Bucket You will need to provide the name of an Amazon S3 bucket to use for Platfora.

High-Level Install Steps

This section lists the high-level steps involved in installing Platfora to work with an Amazon Elastic MapReduce (EMR) Hadoop cluster. Note that there are different procedures if you are installing a new Platfora cluster verses adding a worker node to an existing Platfora cluster.

New Platfora Installation

When installing Platfora for the first time, you begin with installing and configuring the Platfora master node first. After the master node is installed, initialized and connected to the Hadoop services it needs, then you can use the master node to add additional worker nodes into the cluster.

(13)

1. Configure your Amazon Web Services account for Platfora. See AWS Security Settings for Platfora. 2. Initialize the Amazon EC2 Instances for your Platfora nodes. See Platfora EC2 Instance

Requirements.

3. Install Platfora Software and Dependencies. 4. Configure Environment on Platfora Nodes. 5. Configure the Connection to Amazon S3. 6. Initialize the Platfora Master.

7. Configure the Connection to Amazon EMR. 8. Start Platfora.

9. Login to the Platfora Application.

10.Install the License File.

11.(Optional) Load the Tutorial Data (as a quick way to test that everything works). 12.Add Worker Nodes.

Additional Worker Node Installation

Once you have a Platfora master node up and running, you can use it to initialize additional worker nodes. Before you can initialize a worker node, however, you must make sure that it has the required dependencies installed.

These are the high-level steps for adding a worker node to an existing Platfora cluster: 1. Initialize the Amazon EC2 Instance for the new worker node. See Platfora EC2 Instance

Requirements.

2. Install the prerequisite software only directly on the worker node instance. • If using the RPM installer packages, Install Dependencies RPM Package.

• If using the TAR installer packages, you must manually Create the Platfora System User, Set OS Kernel Parameters, and Install Dependent Software.

3. Configure Environment on Platfora Nodes. 4. Add Worker Node to Platfora Cluster.

(14)

2

System Requirements (AWS Cloud)

This section describes the system requirements for customers who plan to use Amazon Web Services (AWS) as their installation environment for Platfora, and Simple Storage Service (S3) and Elastic MapReduce (EMR) and as their Hadoop distributed data storage and processing services.

Topics:

Supported Hadoop and Hive Versions

Platfora EC2 Instance Requirements

Amazon EMR Instance Requirements

AWS Security Settings for Platfora

Port Configuration Requirements

Browser Requirements

Supported Hadoop and Hive Versions

This section lists the Hadoop distributions and versions that are compatible with the Platfora installation packages. If using Hive as a data source for Platfora, the version of Hive must be compatible with the version of Hadoop you are using.

Hadoop Distro Version Hive Version M/R Version Platfora Package CDH 5.3.1+ 0.13.1 YARN cdh52 Cloudera 5 CDH 5.4 1.1 YARN cdh54 HDP 2.2.x 0.14.0 YARN hadoop_2_6_0_hive_0_14_0 Hortonworks HDP 2.3.x 1.2.0 YARN hadoop_2_7_2_hive_1_2_0 MapR 4.0.1 0.12.0 YARN mapr4

MapR

(15)

Hadoop Distro Version Hive Version M/R Version Platfora Package

MapR 4.1.0 0.13.0 YARN mapr41

MapR 5.0.0 1.1 YARN mapr5

Pivotal Labs PivotalHD 3.0 0.14.0 YARN hadoop_2_6_0_hive_0_14_0 Amazon EMR

(AMI 3.7.x)

Hadoop 2.4.0 0.13.1 YARN hadoop_2_4_0_hive_0_13_0

Platfora EC2 Instance Requirements

Platfora recommends the following system requirements for Amazon EC2 instances that will serve as Platfora server nodes. For multi-node installations, the master server instance and all worker server instances must be the same configuration (same EC2 instance type, storage configuration, network configuration, etc.).

Amazon Machine Images (AMIs)

Amazon Linux AMI 2014.03.x or higher Red Hat Enterprise Linux 6.2 - 6.5 Ubuntu Server 12.04.1 LTS or higher EC2 Instance Type Small to Medium Lens Sizes: c3.8xlarge

Medium to Large Lens Sizes, 10+ Platfora nodes: r3.8xlarge Medium to Large Lens Sizes, 1-9 Platfora nodes: i2.8xlarge Root Device Volume

(EBS)

Recommended Size = 1 TB Type = General Purpose (SSD) Additional EBS

Volumes

Optional. Additional EBS volumes can be attached to an EC2 instance after launch time, and can be used to increase lens cache storage capacity if needed. EBS volumes are less expensive than Instance Store volumes, and the data is persistent between shutdowns.

Instance Store Volume (Ephemeral)

Optional. You may choose to add instance store volumes for the Platfora lens cache instead of using EBS volumes. This costs more, but offers slightly faster performance. Instance store volumes can only be attached to an EC2 instance at launch time, and the data is not saved when the instance shuts down. The size of an instance store volume depends on the instance type:

c3.8xlarge: 2 x 320 GB SSD (640 GB) r3.8xlarge: 2 x 320 GB SSD (640 GB) i2.8xlarge: 8 x 800 GB SSD (6400 GB)

(16)

Enhanced Networking

yes (requires use of VPC instead of EC2-Classic)

EBS Optimized Instance

yes (the 8xlarge instance types are EBS optimized instances by default)

Availability Zone yes (use same zone for all nodes in the Platfora cluster) Placement Group yes (use same placement group for all nodes in the Platfora

cluster)

IAM User yes (create a dedicated Platfora IAM User in your AWS account) Other Required

Software

Java 1.7

Python 2.7.8 through 2.7.9 (3.0 not supported)

(master node only) PostgreSQL 9.2.1-1.28 (AMZN), 9.2.5, 9.2.7 or 9.3

OpenSSL 1.0.1 or higher1 Required Unix

Utilities

rsync, ssh, scp, cp, tar, tail, sysctl, ntp, wget

Amazon EMR Instance Requirements

Platfora launches an Elastic MapReduce (EMR) cluster when it builds a lens. This section describes the recommended requirements for the EMR instances that are launched by Platfora.

Amazon EMR is Hadoop as a web service. Platfora uses the EMR Hadoop cluster to process its lens builds. Since the EMR Hadoop cluster is only instantiated as needed, the source data does not reside in the Hadoop Distributed File System (HDFS) of the EMR Hadoop cluster. The source data is instead stored on Amazon S3. Data is copied from S3 to EMR for data processing, then the results are written back to S3 when the job completes.

At the start of a lens build job, the raw source data is copied from S3 to the local HDFS file system on the EMR nodes. The EMR instances must have enough local instance storage to support the input source dataset and the temporary workspace for intermediate lens build job results. Also consider that the local HDFS of the EMR cluster replicates the data to ensure redundancy and high availability during lens build processing.

(17)

Platfora recommends the i2.4xlarge instance type for EMR data nodes and the m3.xlarge for the EMR name node. The i2.4xlarge offers a great balance between total local disk space, CPU power, and per-node memory size.

Hadoop Version 2.4.0 AMI Version 3.7.0 EMR NameNode Instance Type m3.xlarge EMR DataNode Instance Type i2.4xlarge Number of EMR DataNodes

The number of nodes you will need to complete a lens build depends on the following factors:

• The size of the raw dataset in S3 that is considered as input to the lens build.

• The replication factor of HDFS. EMR clusters of 1-4 nodes have a replication factor of 1, 5-9 nodes have a replication factor of 2, and over 10 nodes have a replication factor of 3.

• Temporary work space for intermediate lens build results -about 20-30% of total disk space.

AWS Security Settings for Platfora

Amazon Web Services (AWS) has a number of security features that you can use to protect your AWS account and cloud server instances. This section contains security setting recommendations if you plan to use Amazon Elastic MapReduce (EMR) as the Hadoop implementation for your Platfora cluster.

Amazon AWS Virtual Private Cloud (VPC)

To use Amazon EMR for Hadoop data processing, Platfora must be able to launch an EMR cluster in a public subnet. Administrators do this by provisioning an Amazon VPC with a public subnet, and then specifying the subnet identifier in Platfora. Platfora must create the EMR cluster on an Internet-facing subnet to allow the AWS EMR Provisioning Service to reach the EMR cluster.

Additionally, you must ensure the Platfora server can communicate with the Amazon EMR cluster. If the Platfora server is on the same subnet as the Amazon EMR cluster, this happens automatically. If the Platfora server and the EMR cluster are on different VPC subnets, then a route between the subnets needs to be added to the Route table(s) so that communication can occur between the two subnets. Also, if the VPC uses Access Control Lists (ACLs), then those ACLs must be modified to allow traffic from Platfora to Hadoop.

(18)

After the Amazon VPC has been provisioned, specify its subnet identifier in the

platfora.emr.subnet.id Platfora configuration property.

For more information on setting up and using an Amazon VPC with Amazon EMR, see http:// docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-vpc-subnet.html.

IAM User and IAM Roles for Platfora

AWS Identity and Access Management (IAM) allows you to create users, groups, and roles to control access to AWS services and resources. Platfora recommends creating an IAM User account and two IAM Roles specifically for use by Platfora.

Platfora uses a combination of an IAM User and IAM Roles to communicate with Amazon AWS and to create an EMR cluster. An Amazon AWS administrator needs to create a platfora IAM User and two IAM Roles specifically for use by Platfora. Then a Platfora system administrator needs to enter some information about that user and those roles in Platfora.

The Platfora server uses security credentials of the platfora IAM User to request Amazon AWS to create an Amazon EMR cluster. Once that request is approved, the platfora IAM User then passes an IAM Role to actually launch an EMR cluster, and then uses another IAM Role to start EC2 instances in the EMR cluster. You must specify these roles in Platfora.

For more details on creating the user and roles, see Create IAM User for Platfora and Create IAM Roles for Platfora.

Create IAM User for Platfora

The Amazon AWS administrator can create a new platfora user in the IAM Management Console of your AWS account. After creating the user, download the AWS credentials for this user. The Platfora

(19)

system administrator will need the Access Key Id and Secret Access Key when you initialize Platfora for use with Amazon EMR.

(20)

The security policy for the platfora IAM User must have (at a minimum) the permissions listed in the following sample policy:

{ "Version": "2012-10-17", "Statement": [ { "Action": [ "iam:ListRoles", "iam:PassRole", "elasticmapreduce:*", "s3:GetBucketLocation", "s3:ListAllMyBuckets" ], "Effect": "Allow", "Resource": "*" }, { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::Bucket_defined_in_core-site.xml", "arn:aws:s3:::Datasource_Bucket_1", "arn:aws:s3:::Datasource_Bucket_n" ] }, { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:Get*", "s3:DeleteObject", ], "Resource": [ "arn:aws:s3:::Bucket_defined_in_core-site.xml/*" ] }, { "Effect": "Allow", "Action": [ "s3:Get*" ], "Resource": [

"arn:aws:s3:::Datasource_Bucket_1/path/to/files/*", "arn:aws:s3:::Datasource_Bucket_n/*"

] } ] }

(21)

Under Permissions for this user, attach a security policy that contains the permissions listed above. These permissions allow the platfora IAM User to pass an IAM Role to launch the EMR cluster, start an EMR cluster, and access S3 for source data during data ingest.

Create IAM Roles for Platfora

Amazon requires all AWS users to use IAM Roles to launch EMR clusters. One IAM Role is used to start the Amazon EMR service, and the other role is used by the EC2 instances in the EMR cluster. Amazon AWS offers some default IAM Roles for these services. However, Platfora recommends creating custom IAM Roles specifically for use by Platfora instead.

The Amazon AWS administrator can create the IAM Roles in the IAM Management Console of your AWS account. Create a role for each of the following EMR cluster services, and specify them in Platfora using the specified configuration properties:

Amazon EMR service (service role). In Amazon AWS, create an IAM Role and attach a security policy that contains at a minimum the permissions specified below. Enter this IAM Role name in the platfora.emr.service.role Platfora configuration property. The custom role you define corresponds to the default IAM Role Amazon offers called EMR_DefaultRole.

EC2 instances (instance profile) in the Amazon EMR cluster. In Amazon AWS, create an IAM Role and attach a security policy that contains at a minimum the permissions specified below. Enter this IAM Role name in the platfora.emr.jobflow.role Platfora configuration property. The custom role you define corresponds to the default IAM Role Amazon offers called

EMR_EC2_DefaultRole.

The security policy for the Amazon EMR service (service role) IAM Role must have (at a minimum) the permissions listed in the following sample policy:

{ "Version": "2012-10-17", "Statement": [ { "Action": [ "ec2:AuthorizeSecurityGroupIngress", "ec2:CancelSpotInstanceRequests", "ec2:CreateSecurityGroup", "ec2:CreateTags", "ec2:DeleteTags", "ec2:Describe*", "ec2:ModifyImageAttribute", "ec2:ModifyInstanceAttribute", "ec2:RequestSpotInstances", "ec2:RunInstances", "ec2:TerminateInstances" ], "Effect": "Allow", "Resource": "*" }, { "Action": [

(22)

"iam:PassRole", "iam:ListRolePolicies", "iam:GetRole", "iam:GetRolePolicy", "iam:ListInstanceProfiles" ], "Effect": "Allow", "Resource": "*" }, { "Effect": "Allow", "Action": [ "s3:Get*" ],

"Resource": "arn:aws:s3:::Bucket_defined_in_core-site.xml/*" }

] }

The security policy for the EC2 instances (instance profile) IAM Role must have (at a minimum) the permissions listed in the following sample policy:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Resource": "*", "Action": [ "ec2:Describe*", "elasticmapreduce:Describe*", "elasticmapreduce:ListBootstrapActions", "elasticmapreduce:ListClusters", "elasticmapreduce:ListInstanceGroups", "elasticmapreduce:ListInstances", "elasticmapreduce:ListSteps", "s3:ListAllMyBuckets" ] }, { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::Bucket_defined_in_core-site.xml", "arn:aws:s3:::Datasource_Bucket_1", "arn:aws:s3:::Datasource_Bucket_n" ] }, {

(23)

"Action": [ "s3:PutObject", "s3:Get*", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::Bucket_defined_in_core-site.xml/*", ] }, { "Effect": "Allow", "Action": [ "s3:Get*", "s3:List*" ], "Resource": [

"arn:aws:s3:::Datasource_Bucket_1/path/to/files/*", "arn:aws:s3:::Datasource_Bucket_n/*", "arn:aws:s3:::*elasticmapreduce/*" ] } ] }

Verify that the permissions for and access to Amazon resources (especially S3) for the EC2 instances role are the same or greater than the permissions and access assigned to the platfora IAM User. For example, if the platfora IAM User can access an Amazon S3 bucket, but the EC2 instances role cannot, then lens builds that rely on that S3 bucket will fail.

For more information on using IAM Roles for EMR, see http://docs.aws.amazon.com/ ElasticMapReduce/latest/DeveloperGuide/emr-iam-roles.html.

EC2 Security Group Settings

EC2 security groups allow you to specify firewalling rules for your Amazon elastic cloud computing (EC2) server instances.

EC2 security group rules are independent of, and in addition to, the software firewalling provided by the instance's operating system. Security groups must be defined before you create an EC2 instance.

The security group configured for the Platfora server instance must permit connections from your user network to the Platfora web application server port (8001 by default). You also may want to open the EMR Hadoop ResourceManager and JobHistory web ports so that you can monitor and troubleshoot YARN jobs executed by Platfora.

(24)

An example security group configuration for a Platfora server instance would look something like the following:

Port Configuration Requirements

You must open ports in the firewall of your Platfora nodes to allow client access and intra-cluster communications. You also must open ports within your Hadoop cluster to allow access from Platfora. This section lists the default ports required.

Ports to Open on Platfora Nodes

Your Platfora master node must allow HTTP connections from your user network. All nodes must allow connections from the other Platfora nodes in a multi-node cluster.

On Amazon EC2 instances, you must configure the port firewall rules on the Platfora server instances in addition to the EC2 Security Group Settings.

Platfora Service Default Port Allow connections from…

Master Web Services Port (HTTP)

8001 External user network

Platfora worker servers localhost

Secure Master Web Services Port (HTTPS)

8443 External user network

Platfora worker servers localhost

Master Server Management Port

8002 Platfora worker servers

localhost Worker Server Management

Port

8002 Platfora master server

other Platfora worker servers localhost

(25)

Platfora Service Default Port Allow connections from…

Master Data Port 8003 Platfora worker servers localhost

Spark UI 4040 External user network (optional

for troubleshooting Spark jobs) Worker Data Port 8003 Platfora master server

other Platfora worker servers localhost

Master PostgreSQL Database Port

5432 Platfora worker servers

localhost Spark Ephemeral Port Range Depends on the OS. For

CentOS and Ubuntu, it is 32768 to 61000.

All nodes in the Hadoop cluster or EMR cluster

Browser Requirements

Users can connect to the Platfora web application using the latest HTML5-compliant web browsers. Platfora supports the latest releases of the following web browsers:

• Chrome (preferred browser) • Firefox

• Safari

Internet Explorer with the Compatibility View feature disabled (versions prior to IE 10 are not supported)

(26)

3

Install Platfora Software and Dependencies

This section describes how to provision a Platfora node with the required prerequisites and Platfora software. If you are installing a new Platfora cluster, the master node needs everything (prerequisites and Platfora software). Worker nodes only need the prerequisite software installed prior to initialization.

Most of the tasks in this section require root permissions. The example commands in the documentation use sudo to denote the commands that require root permissions.

Topics:

About the Platfora Installer Packages

Install Using RPM Packages

Install Using the TAR Package

About the Platfora Installer Packages

Platfora provides RPM or TAR installer packages that are specific to the Hadoop distribution you are using. Platfora Customer Support can provide you with the link to download the installer packages for your environment.

Make sure to download the correct Platfora installer packages for your Hadoop distribution and version. See Supported Hadoop and Hive Versions if you are not sure which Platfora package to use for your chosen Hadoop distribution.

RPM Packages

If you plan to install Platfora on a Linux operating system that supports the RPM packager manager, such as RedHat or CentOS, Platfora recommends using the RPM packages to install Platfora and its required dependencies.

The platfora-base RPM package includes all the prerequisite software that Platfora needs, plus automates the OS configurations needed by Platfora. This package should be installed on all Platfora nodes (master and workers).

(27)

The platfora-server package includes the Platfora software only, which only needs to be installed on the master node. The Platfora software is copied to the worker nodes during initialization or upgrade, so you don't need to install it on the worker nodes ahead of time.

TAR Package

If you plan to install Platfora on a Linux operating system that does not support the RPM package manager, such as Ubuntu, you have to use the TAR package. You may also use the TAR package if you just want to install and manage the dependent software that is installed in your environment yourself. The TAR package contains the Platfora server software only, which only needs to be installed on the master node.

The TAR package does not contain the prerequisite software that Platfora needs. You must manually install the required prerequisite software and do the required OS configurations on all Platfora nodes prior to installing and initializing Platfora.

Install Using RPM Packages

Follow the instructions in this section to install the Platfora dependencies and server software using the RPM packages. Install the platfora-base RPM package on all Platfora nodes, and the

platfora-server RPM package on the master node only. If you choose to install the platfora-security RPM package, then install it on all Platfora nodes.

Install Dependencies RPM Package

The platfora-base RPM package contains all of the dependent software required by Platfora, and also automates several OS configuration tasks. Install this package on all Platfora nodes.

This task requires root permissions. Commands that begin with sudo denote root commands.

You must ensure that the platfora-base RPM package can access Yum repositories to install some dependencies, such as OpenSSL and the Java Development Kit (JDK).

The platfora-base RPM package does the following:

• Creates a /usr/local/platfora/base directory containing Platfora's third-party dependencies. • Creates the platfora system user. The platfora user has no password set.

• Generates an SSH key for the platfora system user and adds the key to the user's

authorized_keys file.

• Ensures the OS kernel parameters are appropriate for Platfora and sets them if they are not. • Creates a .bashrc file for the platfora system user.

(28)

The platfora-base package uses the following file naming convention, where version-build is the version and build number of the base package only, and x86_64 is the supported system architecture. The base and Platfora server packages use different versioning schemes.

platfora-base-version-build-x86_64.rpm

The base package is not updated every Platfora release. It is only updated when the Platfora dependencies change, which is not as often. When upgrading Platfora, check the release notes to see if upgrade of the base package is required.

1. Log on to the machine on which you are installing Platfora.

2. Using the download link provided by Platfora Customer Support, download the base package. For example:

$ wget http://downloads.platfora.com/release /platfora-base-version-build-x86_64.rpm

3. (Optional) Download and import the GPG public key file if you want to use GPG checking during installation.

For example:

$ wget http://downloads.platfora.com/gpg/platfora-gpg.public $ sudo rpm --import platfora-gpg.public

4. Install the package using the yum package manager (requires root permission). For example, if you do not want to use GPG checking:

$ sudo yum --nogpgcheck localinstall platfora-base-version-build -x86_64.rpm

For example, if you do want to use GPG checking:

$ sudo yum --nogpgcheck localinstall platfora-base-version-build -x86_64.rpm

$ sudo rpm -K platfora-base-version-build-x86_64.rpm

Confirm that the /usr/local/platfora/base directory was created.

$ sudo ls -a /usr/local/platfora/base

Install Optional Security RPM Package

The platfora-security RPM package contains SSL-enabled PostgreSQL and the OpenSSL package it depends on. This package is only needed if you plan to enable SSL communications between the Platfora worker nodes and the Platfora metadata catalog database. Install this package on all Platfora nodes.

This task requires root permissions. Commands that begin with sudo denote root commands.

(29)

The platfora-security package is installed after the platfora-base package. The

platfora-security RPM package does the following:

• Creates a /usr/local/platfora/security directory containing the SSL-enabled version of PostgreSQL.

• Checks if OpenSSL version 1.0.1 or later is installed, and if not downloads and installs the openssl package dependency from the OpenSSL public repo.

• Edits the .bashrc file for the platfora system user and changes the PATH environment variable so that secure PostgreSQL is listed before the default PostgreSQL installed by the platfora-base package.

The platfora-security package uses the following file naming convention, where

version-build is the version and build number of the base package only, and x86_64 is the supported system architecture. The base, security and Platfora server packages use different versioning schemes.

platfora-security-version-build-x86_64.rpm

The security package only needs to be upgraded when the base package is

upgraded, which is not every release. When upgrading Platfora, check the release notes to see if upgrade of the base and security packages is required.

1. Log on to the machine on which you are installing Platfora.

2. Using the download link provided by Platfora Customer Support, download the security package. For example:

$ wget http://downloads.platfora.com/release /platfora-security-version-build-x86_64.rpm

3. (Optional) Download and import the GPG public key file if you want to use GPG checking during installation.

For example:

$ wget http://downloads.platfora.com/gpg/platfora-gpg.public $ sudo rpm --import platfora-gpg.public

4. Install the package using the yum package manager (requires root permission). For example, if you do not want to use GPG checking:

$ sudo yum --nogpgcheck localinstall platfora-security-version-build -x86_64.rpm

For example, if you do want to use GPG checking:

$ sudo yum --nogpgcheck localinstall platfora-security-version-build -x86_64.rpm

$ sudo rpm -K platfora-security-version-build-x86_64.rpm

Confirm that the /usr/local/platfora/security directory was created.

(30)

Install Platfora RPM Package (Master Only)

The platfora-server RPM package contains the Platfora server software. Install this package on the Platfora master node only.

The platfora-server RPM package creates a /user/local/platfora/platfora-server directory containing the Platfora software.

The platfora-server package uses the following file naming convention, where hadoop_distro corresponds to the Hadoop distribution you are using, version-build is the version and build number of the Platfora software, and x86_64 is the supported system architecture.

platfora-server-hadoop_distro-version-build-x86_64.rpm

Make sure to download the correct Platfora installer packages for your Hadoop distribution and version. See Supported Hadoop and Hive Versions if you are not sure which Platfora package to use for your chosen Hadoop distribution.

This task requires root permissions. Commands that begin with sudo denote root commands. 1. Log on to the machine on which you are installing the Platfora master.

2. Using the download link provided by Platfora Customer Support, download the Platfora server package. For example:

$ wget http://downloads.platfora.com/release /platfora-server-hadoop_distro-version-build-x86_64.rpm

3. (Optional) Download and import the GPG public key file if you want to use GPG checking during installation.

For example:

$ wget http://downloads.platfora.com/gpg/platfora-gpg.public $ sudo rpm --import platfora-gpg.public

4. Install the package using the yum package manager (requires root permission). For example, if you do not want to use GPG checking:

$ sudo yum --nogpgcheck localinstall platfora-server-hadoop_distro-version-build-x86_64.rpm For example, if you do want to use GPG checking:

$ sudo yum --nogpgcheck localinstall platfora-server-hadoop_distro-version-build-x86_64.rpm

$ sudo rpm -K platfora-server-hadoop_distro-version-build-x86_64.rpm Confirm that the /usr/local/platfora/platfora-server directory was created.

(31)

Install Using the TAR Package

Follow the instructions in this section to install the Platfora dependencies and server software using the TAR packages. The TAR package contains the Platfora server software only. You must install all dependencies yourself.

For the Platfora master node, do all the tasks described in this section.

For a Platfora worker node, do all the tasks described in this section except for: • Install PostgreSQL

• Install Platfora TAR Package • Install PDF Dependencies

Create the Platfora System User

Platfora requires a platfora system user account to own the Platfora installation and run the Platfora server processes. This same system user must be created on all Platfora nodes.

This task requires root permissions. Commands that begin with sudo denote root commands.

(MapR Only) If you are using MapR as your Hadoop distribution with Platfora, make sure to follow the additional steps for MapR. The platfora system user must exist on all Platfora nodes and all MapR nodes. The UID/GID must also be the same on the MapR nodes as on Platfora nodes.

1. Create the platfora system user:

$ sudo useradd -s /bin/bash -m -d /home/platfora platfora

2. Set a password for the platfora user:

$ sudo passwd platfora

3. (MapR Only) Check the /etc/passwd file on your MapR CLDB node, and find the entry for the

platfora user. Note the user and group id numbers that are used. For example:

platfora:x:1002:1002::/home/platfora:/bin/bash

4. (MapR Only) Check the /etc/passwd file on your Platfora master node. If the user and group id numbers for the platfora user are different, update them so that they are the same as on the MapR nodes.

For example:

$ sudo usermod -u 1002 platfora $ sudo groupmod -g 1002 platfora

(32)

Configure

sudo

for the

platfora

User

This is an optional task. Configuring sudo access for the platfora system user is a convenient way to run commands as root while logged in as the platfora user.

If you do not configure sudo access for the platfora user, then you must change to the root user to execute the system commands that require root permissions.

This documentation assumes that you have sudo access configured. If you do not, every time you see

sudo at the beginning of a command, it means you need to be root to run the command.

1. Edit the /etc/sudoers file using the visudo command.

$ sudo visudo

2. Add a line such as the following in this file:

# User privilege specification platfora ALL=(ALL:ALL) ALL

3. Save your changes and exit the visudo editor. Generate and Authorize an SSH Key

Generating and authorizing an SSH key for the platfora system user on the localhost is required by the Platfora management utilities. This task should be performed on all Platfora nodes.

The Platfora management utilities require a trusted-host environment (the ability to SSH to a remote system in the Platfora cluster without a password prompt). Even in single-node installations, you must exchange SSH keys for the localhost.

1. Make sure that Selinux is disabled using either the sestatus or getenforce command.

$ sestatus

If Selinux is enabled, disable it using the recommended procedure for the node's operating system. 2. Make sure you are logged in to the Platfora server as the platfora system user.

$ su - platfora

3. Go to the ~/.ssh directory (create it if it does not exist):

$ mkdir .ssh $ cd .ssh

4. Generate a public/private key pair that is NOT passphrase-protected. Press the ENTER or RETURN key for each prompt:

$ ssh-keygen -C 'platfora key for node 0' -t rsa

Enter file in which to save the key (/home/platfora/.ssh/ id_rsa): ENTER

Enter passphrase (empty for no passphrase): ENTER Enter same passphrase again: ENTER

(33)

5. Append the public key to the ~/.ssh/authorized_keys file (this allows SSH access from the current host to itself):

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

6. Make sure the home directory, .ssh directory, and the files it contains have the correct permissions:

$ chmod 700 $HOME && chmod 700 ~/.ssh && chmod 600 ~/.ssh/*

7. Test that you can SSH to localhost without a password prompt. If prompted to add localhost to the list of known hosts, enter yes :

$ ssh localhost

The authenticity of host 'localhost (127.0.0.1)' can't be established...

Are you sure you want to continue connecting (yes/no)? yes

Set OS Kernel Parameters

This section has the Linux OS kernel settings required for Platfora. You must have root or sudo permissions to change kernel parameter settings. Changing kernel settings requires a system reboot in order for the changes to take effect.

Kernel ulimit Setting

Linux operating systems set limits on the number of open files and connections a process can have. For some applications, such as Platfora and Hadoop, having a lot of open file handlers during processing is normal. Having the limit set too low can cause Platfora lens builds to fail.

There are two places file limits are set in the Linux operating system: • A global limit for the entire system (set in /etc/sysctl.conf) • A per-user process limit (set in /etc/security/limits.conf) You must have root or sudo permissions to change OS ulimit settings. You can check the global limit by running the command:

$ cat /proc/sys/fs/file-nr

This should return a set of three numbers like this:

704 0 294180

The first number is the number of currently opened file descriptors. The second number is the number of allocated file descriptors. The third number is the maximum number of file descriptors for the whole system. This limit should be at least 250000.

To increase the global limit, edit /etc/sysctl.conf (as root) and set the property:

fs.file-max = 294180

You can check the per-user process limit by running the command:

(34)

This should return the file limit for the currently logged in user, for example:

1024

This limit should be at least 20000 for the platfora user (or whatever user runs the Platfora server). To increase the limit, edit /etc/security/limits.conf (as root) and the following lines (the * increases the limit for all system users):

* hard nofile 65536 * soft nofile 65536 root hard nofile 65536 root soft nofile 65536

Reboot the server for the changes to take effect.

$ sudo reboot

Kernel Memory Overcommit Setting

Linux operating systems allow memory to be overcommitted, meaning the OS will allow an application to reserve more memory than actually exists within the system. Allowing overcommit prevents the OS from killing processes when a process requests more memory than is available.

If you are using a version 1.6 Java Runtime Environment (JRE), you must configure your OS to allow memory overcommit. If you are using a version 1.7 JRE, overcommit is not necessary.

You must have root or sudo permissions to change kernel memory overcommit settings. 1. Check your version of Java.

$ java -version

If you are running a 1.6 version, proceed to the next steps. If you are running a 1.7 version, you do not need to make any further changes.

2. Edit the /etc/systcl.conf file.

$ sudo vi /etc/systcl.conf

3. Set the following value:

vm.overcommit_memory=1

4. Save and close the file.

5. Reboot your system for the change to take effect:

$ sudo reboot

Kernel Shared Memory Settings

Some default OS installations have the system shared memory values set too low for Platfora. You may need to increase the shared memory settings if they are set too low.

You must have root or sudo permissions to set the system shared memory parameters.

1. In /etc/sysctl.conf, make sure the shared memory parameters have the minimum values or higher.

(35)

If your settings are lower than these minimum values, you will need to change them. If they are higher than the minimum, leave them as is.

kernel.shmmax=17179869184 kernel.shmall=4194304

2. If you made changes to /etc/sysctl.conf, reboot the server for the changes to take effect.

$ sudo reboot

Install Dependent Software

If using the TAR installation package to install Platfora, you must install all of the dependencies yourself. This section provides instructions for manually installing the prerequisite software on a Platfora node.

If you are provisioning a Platfora master node, you must install all dependencies.

If you are provisioning a Platfora worker node, you can skip the task for installing PostgreSQL. PostgreSQL is only needed on the Platfora master node.

Confirm Linux OS Utilities

Platfora requires several standard Linux utilities to be installed on your system and in your environment

PATH. Check your system for the required utilites before installing Platfora. Most Linux operating systems already have these utilities installed by default. • rsyncsshscptailtarcpwgetntp

sysctl (/usr/sbin must be in your PATH)

To verify that a utility is installed and can be found in the PATH, you can check its location using the

which command. For example:

$ which rsync $ which tar $ which sysctl

If a utility is not installed, you will need to install it before installing Platfora. Check your OS documentation for instructions on installing these utilities.

(36)

Install Java

The Platfora server requires a Java Runtime Environment (JRE) version 1.7 or higher. Platfora recommends installing the full Java Development Kit (JDK) for access to the latest Java features and diagnostic tools.

The instructions in this section are for installing version 1.7 of the Open Java Development Kit (OpenJDK).

You must have root or sudo permissions to install Java. 1. Check if Java 1.7 or higher is already installed.

$ java -version

If java is not found, you will need to install it. 2. Install OpenJDK using your OS package manager.

On Ubuntu Systems:

$ sudo apt-get install openjdk-7-jdk

On RedHat/CentOS Systems:

$ su -c "yum install java-1.7.0-openjdk"

3. Set the JAVA_HOME environment variable in the platfora user’s profile file. For example, where

java_directory is the versioned directory where Java is installed:

$ echo "export JAVA_HOME=/usr/lib/jvm/java_directory/jre" >> /home/ platfora/.bashrc

$ echo "export PATH=$JAVA_HOME/bin:$PATH" >> /home/platfora/.bashrc $ source /home/platfora/.bashrc

4. Make sure JAVA_HOME is set correctly for the platfora user:

$ su - platfora $ echo $JAVA_HOME

Confirm Python Installation

The Platfora management utilities require Python version 2.6.8, 2.7.1, or 2.7.3 through 2.7.6. Python version 3.0 is not supported. Most Linux operating systems already have Python installed by default, but you need to make sure the version is compatible with Platfora.

To check if the correct version of Python is installed:

$ python -V

If Python is not installed (or you have an incompatible version of Python) you will need to install or upgrade/downgrade it before installing Platfora. Check your OS documentation for instructions on installing or upgrading/downgrading Python to version 2.6.8 or higher 2.x version.

(37)

Install PostgreSQL (Master Only)

Platfora stores its metadata catalog in a PostgreSQL relational database. PostgreSQL version 9.2 or 9.3 must be installed (but not running) on the Platfora master server before you start Platfora for the first time. Platfora worker nodes do not require a PostgreSQL installation.

You must have root or sudo permissions to install PostgreSQL. Install PostgreSQL 9.2 on Ubuntu Systems

These instructions are for installing PostgreSQL 9.2 on Linux Ubuntu operating systems. 1. Install the dependent libraries:

$ sudo apt-get install libpq-dev

2. Add the PostgreSQL repository to your system configuration:

$ sudo add-apt-repository ppa:pitti/postgresql $ sudo apt-get update

3. Install PostgreSQL 9.2:

$ sudo apt-get install postgresql-9.2

4. Stop the PostgreSQL service.

$ sudo service postgresql stop

5. Remove the PostgreSQL automatic start-up scripts:

$ sudo rm /etc/rc*/*postgresql

6. Create and change the ownership on the directory where PostgreSQL writes its lock files:

$ sudo mkdir /var/run/postgresql

$ sudo chown platfora /var/run/postgresql

7. Update the platfora user’s PATH environment variable to include the PostgreSQL executable directory and /usr/sbin:

$ echo "export PATH=/usr/lib/postgresql/9.2/bin:/usr/sbin:$PATH" >> / home/platfora/.bashrc

$ source /home/platfora/.bashrc

Install PostgreSQL 9.2 on RedHat/CentOS Systems

These instructions are for installing PostgreSQL 9.2 on RedHat Enterprise Linux (RHEL) or CentOS operating systems.

1. Download the appropriate PostgreSQL 9.2 YUM repository for your operating system. Go to the PostgreSQL yum repository website, copy the URL link for the appropriate YUM repository configuration, and download it using wget.

For example, to download the YUM repository configuration for PostgreSQL 9.2 on a 64-bit RHEL 6 operating system.

$ wget http://yum.pgrpms.org/9.2/redhat/rhel-6-x86_64/pgdg-redhat92-9.2-7.noarch.rpm

2. Add the PostgreSQL YUM repository to your system configuration:

(38)

3. Install PostgreSQL:

$ sudo yum install postgresql92 postgresql92-server

4. If it is enabled, disable the PostgreSQL automatic start-up.

Each operating system has its own technique for auto starting PostgreSQL. If your system uses

chkconfig to manage init scripts, you can remove PostgreSQL from the chkconfig control using the following command:

chkconfig --del postgresql

For some operating systems, the PostgreSQL start.conf file configures the auto-start of a specific PostgreSQL cluster.

5. Create and change the ownership on the directory where PostgreSQL writes its lock files:

$ sudo mkdir /var/run/postgresql

$ sudo chown platfora /var/run/postgresql

6. Update platfora user’s PATH environment variable to include the PostgreSQL executable directory and /usr/sbin:

$ echo "export PATH=/usr/pgsql-9.2/bin:/usr/sbin:$PATH" >> /home/ platfora/.bashrc

$ source /home/platfora/.bashrc Confirm OpenSSL Installation

Platfora uses OpenSSL for secure communications between the Platfora worker servers and its metadata catalog database. If you decide to enable SSL for the Platfora catalog, which is optional, you will need OpenSSL version 1.0.1 or higher on your Platfora nodes.

As an optional security feature, you can choose to enable SSL communications between the Platfora metadata catalog and the Platfora worker nodes. If you decide to enable this, you will need to have: • SSL-enabled PostgreSQL. If using the RPM installation packages, Platfora provides an optional

platfora-security package that contains SSL-enabled PostgreSQL. If using the TAR

installation packages, the packages provided in the PostgreSQL public repo come with SSL enabled. • OpenSSL. If using the RPM installation packages, Platfora provides an optional

platfora-security RPM package that pulls this dependency from the public repo. If using the TAR installation packages, you will have to install the openssl package yourself.

Many Linux operating systems already have OpenSSL installed by default, but you need to make sure the version is compatible with the version that PostgreSQL uses.

1. Check that OpenSSL version 1.0.1 or higher is installed.

$ openssl version

2. If OpenSSL is not installed (or you have an incompatible version) you will need to install or upgrade it before enabling SSL for the Platfora catalog. Check your OS documentation for instructions on installing or upgrading the openssl package.

(39)

Install Platfora TAR Package (Master Only)

The TAR installation package contains the Platfora server software only. You only need to install this package on the Platfora master node. You can skip this task if you are provisioning a Platfora worker node.

The platfora tar package uses the following file naming convention, where version-build.no is the version and build number of the Platfora software and hadoop_distro corresponds to the Hadoop distribution you are using.

platfora-version-build.num-hadoop_distro.tgz

Make sure to download the correct Platfora installer package for your Hadoop distribution and version. See Supported Hadoop and Hive Versions if you are not sure which Platfora package to use for your chosen Hadoop distribution.

This task requires root permissions. Commands that begin with sudo denote root commands. 1. Log on to the machine on which you are installing the Platfora master.

2. Create a Platfora installation directory and ensure that it is owned by the platfora system user. For example:

$ sudo mkdir /usr/local/platfora

$ sudo chown platfora /usr/local/platfora -R

3. Log in as the platfora user and go to the installation directory that you just created:

$ su - platfora

$ cd /usr/local/platfora

4. Download the 5.0.0 release package and checksum file using the URLs provided by Platfora Customer Support.

Make sure to download the correct packages for your Hadoop distribution version. For example:

$ wget http://downloads.platfora.com/release/platfora-version -build.num-hadoop_distro.tgz

$ wget http://downloads.platfora.com/release/platfora-version -build.num-hadoop_distro.tgz.sha

5. After downloading the package and checksum file, make sure the package is valid using the shasum command.

For example:

$ shasum -c platfora-version-build.num-hadoop_distro.tgz.sha If the package is valid, you should see a message such as:

platfora-version-build.num-hadoop_distro.tgz: OK

6. Unpack the package within the installation directory. For example:

$ tar -zxvf platfora-version-build.num-hadoop_distro.tgz

(40)

For example:

$ ln -s platfora-version-build.num-hadoop_distro platfora-server

8. Set the PLATFORA_HOME environment variable for the platfora system user.

$ echo "export PLATFORA_HOME=/usr/local/platfora/platfora-server" >> $HOME/.bashrc

9. Set the PATH environment variable for the platfora system user.

The PATH should include /usr/sbin, $PLATFORA_HOME/bin, and the PostgreSQL executable directories. If your system has more than one version of PostgreSQL installed, make sure that 9.2 is listed first in the PATH of the platfora user.

For example (Ubuntu):

$ echo "export PATH=/usr/lib/postgresql/9.2/bin:/usr/sbin: $PLATFORA_HOME/bin:$PATH" >> $HOME/.bashrc

$ source $HOME/.bashrc

For example (RedHat/CentOS):

$ echo "export PATH=/usr/pgsql-9.2/bin:/usr/sbin:$PLATFORA_HOME/bin: $PATH" >> $HOME/.bashrc

$ source $HOME/.bashrc

10.Make sure the JAVA_HOME environment variable is set (if it's not, see Install Java).

$ echo $JAVA_HOME

Install PDF Dependencies (Master Only)

One feature of Platfora is the ability to save a vizboard as a PDF document. In order for the Platfora server to render PDFs, it needs PhantomJS and the OpenSans font to be installed on the Platfora master node. You can skip this task if you are provisioning a Platfora worker node.

The PhantomJS installation relies on several fonts that ship with the Platfora software. For this reason, the PhantomJS installation must be done after installing the Platfora software.

To install PhantomJS, do the following:

1. Log into the Platfora master node as the platfora user. 2. Install the PhantomJS dependencies.

On Ubuntu On Redhat/CentOS

$ sudo apt-get install fontconfig $ sudo apt-get install libfreetype6 $ sudo apt-get install libfontconfig1 $ sudo apt-get install libstdc++6

$ sudo yum install fontconfig $ sudo yum install freetype

$ sudo yum install libfreetype.so.6 $ sudo yum install libfontconfig.so.1 $ sudo yum install libstdc++.so.6

(41)

3. Download the compiled PhantomJS executable.

$ sudo wget https://bitbucket.org/ariya/phantomjs/downloads/ phantomjs-1.9.7-linux-x86_64.tar.bz2

4. Extract the files.

$ sudo tar xjf phantomjs-1.9.7-linux-x86_64.tar.bz2

5. Copy the PhantomJS binary to an accessible bin directory.

You should choose a bin directory that is common to most user environments.

$ sudo cp phantomjs-1.9.7-linux-x86_64/bin/phantomjs /usr/local/bin

6. Verify the phantomjs command is accessible to the platfora user.

$ which phantomjs

/usr/local/bin/phantomjs

If the command is not found, add the bin directory to the platfora user's environment:

$ echo "export PATH=/usr/local/bin:/usr/sbin:$PATH" >> /home/ platfora/.bashrc

$ source /home/platfora/.bashrc

7. Install the OpenSans font for use by the PDF feature. a) Make a directory to contain the typeface.

$ sudo mkdir -p /usr/share/fonts/truetype b) Copy the font to the truetype directory.

$ sudo cp -r $PLATFORA_HOME/server/webapps/proton/dist/fonts/ OpenSans /usr/share/fonts/truetype

c) Refresh the font cache.

$ sudo fc-cache -f

After installing, you'll want to verify the installation is running correctly. One easy way to do this is using examples that came with the PhantomJS tarball:

$ phantomjs phantomjs-1.9.7-linux-x86_64/examples/hello.js Hello, world!

You can also output a PDF to verify the fonts were installed correctly. to output to PDF choose Share

(42)

side shows the output when the fonts are installed. The right side was rendered without the proper fonts installed:

(43)

4

Configure Environment on Platfora Nodes

This section describes how to configure a Platfora node's operating system and network environment. You should perform these tasks on every node in the Platfora cluster (master and workers) after you have installed the Platfora dependencies and software, but before you initialize Platfora (or initialize a new worker node).

Topics:

Install the MapR Client Software (MapR Only)

Configure Network Environment

Configure Passwordless SSH

Synchronize the System Clocks

Create Local Storage Directories

Verify Environment Variables

Install the MapR Client Software (MapR Only)

If you are using MapR as your Hadoop distribution, you must install the MapR client software on all Platfora nodes (master and workers). If you are not using MapR with Platfora, you can skip this task.

Platfora uses the MapR client to submit MapReduce jobs and file system commands directly to the MapR cluster. For more information about the MapR client, see the MapR documentation.

You must ensure that you use the same version of the MapR client software as the MapR server version.

You must have root or sudo permissions to install the MapR client. Installing the MapR Client on Ubuntu

1. Add the following line to the /etc/apt/sources.list file:

deb http://package.mapr.com/releases/version/ubuntu/ mapr optional

References

Related documents