• No results found

Informatica Intelligent Data Lake (Version 10.1) Installation and Configuration Guide

N/A
N/A
Protected

Academic year: 2021

Share "Informatica Intelligent Data Lake (Version 10.1) Installation and Configuration Guide"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Informatica Intelligent Data Lake

(Version 10.1)

Installation and Configuration

Guide

(2)

Informatica Intelligent Data Lake Installation and Configuration Guide Version 10.1

June 2016

Copyright (c) 1993-2016 Informatica LLC. All rights reserved.

This software and documentation contain proprietary information of Informatica LLC and are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica LLC. This Software may be protected by U.S. and/or international Patents and other Patents Pending.

Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as provided in DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013©(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable.

The information in this product or documentation is subject to change without notice. If you find any problems in this product or documentation, please report them to us in writing.

Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange, PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange Informatica On Demand, Informatica Identity Resolution, Informatica Application Information Lifecycle Management, Informatica Complex Event Processing, Ultra Messaging, Informatica Master Data Management, and Live Data Map are trademarks or registered trademarks of Informatica LLC in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners.

Portions of this software and/or documentation are subject to copyright held by third parties, including without limitation: Copyright DataDirect Technologies. All rights reserved. Copyright © Sun Microsystems. All rights reserved. Copyright © RSA Security Inc. All Rights Reserved. Copyright © Ordinal Technology Corp. All rights reserved. Copyright © Aandacht c.v. All rights reserved. Copyright Genivia, Inc. All rights reserved. Copyright Isomorphic Software. All rights reserved. Copyright © Meta Integration Technology, Inc. All rights reserved. Copyright © Intalio. All rights reserved. Copyright © Oracle. All rights reserved. Copyright © Adobe Systems

Incorporated. All rights reserved. Copyright © DataArt, Inc. All rights reserved. Copyright © ComponentSource. All rights reserved. Copyright © Microsoft Corporation. All rights reserved. Copyright © Rogue Wave Software, Inc. All rights reserved. Copyright © Teradata Corporation. All rights reserved. Copyright © Yahoo! Inc. All rights reserved. Copyright © Glyph & Cog, LLC. All rights reserved. Copyright © Thinkmap, Inc. All rights reserved. Copyright © Clearpace Software Limited. All rights reserved. Copyright © Information Builders, Inc. All rights reserved. Copyright © OSS Nokalva, Inc. All rights reserved. Copyright Edifecs, Inc. All rights reserved. Copyright Cleo Communications, Inc. All rights reserved. Copyright © International Organization for Standardization 1986. All rights reserved. Copyright © ej-technologies GmbH. All rights reserved. Copyright © Jaspersoft Corporation. All rights reserved. Copyright © International Business Machines Corporation. All rights reserved. Copyright © yWorks GmbH. All rights reserved. Copyright © Lucent Technologies. All rights reserved. Copyright (c) University of Toronto. All rights reserved. Copyright © Daniel Veillard. All rights reserved. Copyright © Unicode, Inc. Copyright IBM Corp. All rights reserved. Copyright © MicroQuill Software Publishing, Inc. All rights reserved. Copyright © PassMark Software Pty Ltd. All rights reserved. Copyright © LogiXML, Inc. All rights reserved. Copyright © 2003-2010 Lorenzi Davide, All rights reserved. Copyright © Red Hat, Inc. All rights reserved. Copyright © The Board of Trustees of the Leland Stanford Junior University. All rights reserved. Copyright © EMC Corporation. All rights reserved. Copyright © Flexera Software. All rights reserved. Copyright © Jinfonet Software. All rights reserved. Copyright © Apple Inc. All rights reserved. Copyright © Telerik Inc. All rights reserved. Copyright © BEA Systems. All rights reserved. Copyright © PDFlib GmbH. All rights reserved. Copyright © Orientation in Objects GmbH. All rights reserved. Copyright © Tanuki Software, Ltd. All rights reserved. Copyright © Ricebridge. All rights reserved. Copyright © Sencha, Inc. All rights reserved. Copyright © Scalable Systems, Inc. All rights reserved. Copyright © jQWidgets. All rights reserved. Copyright © Tableau Software, Inc. All rights reserved. Copyright© MaxMind, Inc. All Rights Reserved. Copyright © TMate Software s.r.o. All rights reserved. Copyright © MapR Technologies Inc. All rights reserved. Copyright © Amazon Corporate LLC. All rights reserved. Copyright © Highsoft. All rights reserved. Copyright © Python Software Foundation. All rights reserved. Copyright © BeOpen.com. All rights reserved. Copyright © CNRI. All rights reserved.

This product includes software developed by the Apache Software Foundation (http://www.apache.org/), and/or other software which is licensed under various versions of the Apache License (the "License"). You may obtain a copy of these Licenses at http://www.apache.org/licenses/. Unless required by applicable law or agreed to in writing, software distributed under these Licenses is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the Licenses for the specific language governing permissions and limitations under the Licenses.

This product includes software which was developed by Mozilla (http://www.mozilla.org/), software copyright The JBoss Group, LLC, all rights reserved; software copyright © 1999-2006 by Bruno Lowagie and Paulo Soares and other software which is licensed under various versions of the GNU Lesser General Public License Agreement, which may be found at http:// www.gnu.org/licenses/lgpl.html. The materials are provided free of charge by Informatica, "as-is", without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose.

The product includes ACE(TM) and TAO(TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University, University of California, Irvine, and Vanderbilt University, Copyright (©) 1993-2006, all rights reserved.

This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit (copyright The OpenSSL Project. All Rights Reserved) and redistribution of this software is subject to terms available at http://www.openssl.org and http://www.openssl.org/source/license.html.

This product includes Curl software which is Copyright 1996-2013, Daniel Stenberg, <[email protected]>. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://curl.haxx.se/docs/copyright.html. Permission to use, copy, modify, and distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.

The product includes software copyright 2001-2005 (©) MetaStuff, Ltd. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://www.dom4j.org/ license.html.

The product includes software copyright © 2004-2007, The Dojo Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://dojotoolkit.org/license.

This product includes ICU software which is copyright International Business Machines Corporation and others. All rights reserved. Permissions and limitations regarding this software are subject to terms available at http://source.icu-project.org/repos/icu/icu/trunk/license.html.

This product includes software copyright © 1996-2006 Per Bothner. All rights reserved. Your right to use such materials is set forth in the license which may be found at http:// www.gnu.org/software/ kawa/Software-License.html.

This product includes OSSP UUID software which is Copyright © 2002 Ralf S. Engelschall, Copyright © 2002 The OSSP Project Copyright © 2002 Cable & Wireless Deutschland. Permissions and limitations regarding this software are subject to terms available at http://www.opensource.org/licenses/mit-license.php.

This product includes software developed by Boost (http://www.boost.org/) or under the Boost software license. Permissions and limitations regarding this software are subject to terms available at http:/ /www.boost.org/LICENSE_1_0.txt.

This product includes software copyright © 1997-2007 University of Cambridge. Permissions and limitations regarding this software are subject to terms available at http:// www.pcre.org/license.txt.

This product includes software copyright © 2007 The Eclipse Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http:// www.eclipse.org/org/documents/epl-v10.php and at http://www.eclipse.org/org/documents/edl-v10.php.

(3)

This product includes software licensed under the terms at http://www.tcl.tk/software/tcltk/license.html, http://www.bosrup.com/web/overlib/?License, http:// www.stlport.org/doc/ license.html, http://asm.ow2.org/license.html, http://www.cryptix.org/LICENSE.TXT, http://hsqldb.org/web/hsqlLicense.html, http://

httpunit.sourceforge.net/doc/ license.html, http://jung.sourceforge.net/license.txt , http://www.gzip.org/zlib/zlib_license.html, http://www.openldap.org/software/release/ license.html, http://www.libssh2.org, http://slf4j.org/license.html, http://www.sente.ch/software/OpenSourceLicense.html, http://fusesource.com/downloads/license-agreements/fuse-message-broker-v-5-3- license-agreement; http://antlr.org/license.html; http://aopalliance.sourceforge.net/; http://www.bouncycastle.org/licence.html; http://www.jgraph.com/jgraphdownload.html; http://www.jcraft.com/jsch/LICENSE.txt; http://jotm.objectweb.org/bsd_license.html; . http://www.w3.org/Consortium/Legal/ 2002/copyright-software-20021231; http://www.slf4j.org/license.html; http://nanoxml.sourceforge.net/orig/copyright.html; http://www.json.org/license.html; http:// forge.ow2.org/projects/javaservice/, http://www.postgresql.org/about/licence.html, http://www.sqlite.org/copyright.html, http://www.tcl.tk/software/tcltk/license.html, http:// www.jaxen.org/faq.html, http://www.jdom.org/docs/faq.html, http://www.slf4j.org/license.html; http://www.iodbc.org/dataspace/iodbc/wiki/iODBC/License; http:// www.keplerproject.org/md5/license.html; http://www.toedter.com/en/jcalendar/license.html; http://www.edankert.com/bounce/index.html; http://www.net-snmp.org/about/ license.html; http://www.openmdx.org/#FAQ; http://www.php.net/license/3_01.txt; http://srp.stanford.edu/license.txt; http://www.schneier.com/blowfish.html; http:// www.jmock.org/license.html; http://xsom.java.net; http://benalman.com/about/license/; https://github.com/CreateJS/EaselJS/blob/master/src/easeljs/display/Bitmap.js; http://www.h2database.com/html/license.html#summary; http://jsoncpp.sourceforge.net/LICENSE; http://jdbc.postgresql.org/license.html; http://

protobuf.googlecode.com/svn/trunk/src/google/protobuf/descriptor.proto; https://github.com/rantav/hector/blob/master/LICENSE; http://web.mit.edu/Kerberos/krb5-current/doc/mitK5license.html; http://jibx.sourceforge.net/jibx-license.html; https://github.com/lyokato/libgeohash/blob/master/LICENSE; https://github.com/hjiang/jsonxx/ blob/master/LICENSE; https://code.google.com/p/lz4/; https://github.com/jedisct1/libsodium/blob/master/LICENSE; http://one-jar.sourceforge.net/index.php?

page=documents&file=license; https://github.com/EsotericSoftware/kryo/blob/master/license.txt; http://www.scala-lang.org/license.html; https://github.com/tinkerpop/ blueprints/blob/master/LICENSE.txt; http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html; https://aws.amazon.com/asl/; https://github.com/ twbs/bootstrap/blob/master/LICENSE; https://sourceforge.net/p/xmlunit/code/HEAD/tree/trunk/LICENSE.txt; https://github.com/documentcloud/underscore-contrib/blob/ master/LICENSE, and https://github.com/apache/hbase/blob/master/LICENSE.txt.

This product includes software licensed under the Academic Free License (http://www.opensource.org/licenses/afl-3.0.php), the Common Development and Distribution License (http://www.opensource.org/licenses/cddl1.php) the Common Public License (http://www.opensource.org/licenses/cpl1.0.php), the Sun Binary Code License Agreement Supplemental License Terms, the BSD License (http:// www.opensource.org/licenses/bsd-license.php), the new BSD License (http://opensource.org/ licenses/BSD-3-Clause), the MIT License (http://www.opensource.org/licenses/mit-license.php), the Artistic License (http://www.opensource.org/licenses/artistic-license-1.0) and the Initial Developer’s Public License Version 1.0 (http://www.firebirdsql.org/en/initial-developer-s-public-license-version-1-0/).

This product includes software copyright © 2003-2006 Joe WaInes, 2006-2007 XStream Committers. All rights reserved. Permissions and limitations regarding this software are subject to terms available at http://xstream.codehaus.org/license.html. This product includes software developed by the Indiana University Extreme! Lab. For further information please visit http://www.extreme.indiana.edu/.

This product includes software Copyright (c) 2013 Frank Balluffi and Markus Moeller. All rights reserved. Permissions and limitations regarding this software are subject to terms of the MIT license.

See patents at https://www.informatica.com/legal/patents.html.

DISCLAIMER: Informatica LLC provides this documentation "as is" without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of noninfringement, merchantability, or use for a particular purpose. Informatica LLC does not warrant that this software or documentation is error free. The information provided in this software or documentation may include technical inaccuracies or typographical errors. The information in this software and documentation is subject to change at any time without notice.

NOTICES

This Informatica product (the "Software") includes certain drivers (the "DataDirect Drivers") from DataDirect Technologies, an operating company of Progress Software Corporation ("DataDirect") which are subject to the following terms and conditions:

1. THE DATADIRECT DRIVERS ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.

2. IN NO EVENT WILL DATADIRECT OR ITS THIRD PARTY SUPPLIERS BE LIABLE TO THE END-USER CUSTOMER FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, CONSEQUENTIAL OR OTHER DAMAGES ARISING OUT OF THE USE OF THE ODBC DRIVERS, WHETHER OR NOT INFORMED OF THE POSSIBILITIES OF DAMAGES IN ADVANCE. THESE LIMITATIONS APPLY TO ALL CAUSES OF ACTION, INCLUDING, WITHOUT LIMITATION, BREACH OF CONTRACT, BREACH OF WARRANTY, NEGLIGENCE, STRICT LIABILITY, MISREPRESENTATION AND OTHER TORTS. Part Number: IDL-INS-1000-0001

(4)

Table of Contents

Preface . . . 6

Informatica Resources. . . 6

Informatica Network. . . 6

Informatica Knowledge Base. . . 6

Informatica Documentation. . . 7

Informatica Product Availability Matrixes. . . 7

Informatica Velocity. . . 7

Informatica Marketplace. . . 7

Informatica Global Customer Support. . . 7

Chapter 1: Introduction to Intelligent Data Lake Installation. . . 8

Intelligent Data Lake Installation Overview. . . 8

Live Data Map Installation. . . 8

Intelligent Data Lake Installation. . . 9

Installation Process. . . 9

Chapter 2: Before You Install. . . 10

Before You Install Overview. . . 10

Read the Release Notes. . . 10

Verify the License Key. . . 11

Install and Configure Live Data Map. . . 11

Before Installation. . . 11

During Installation. . . 12

After Installation. . . 12

Install Big Data Management Packages. . . 12

Create HDFS and Hive Connections for the Data Lake. . . 13

Verify System Requirements. . . 13

Verify Temporary Disk Space Requirements. . . 13

Verify Database Requirements. . . 14

Verify Services Installation Requirements. . . 14

Verify Hardware Requirements. . . 14

Setup the Database for Data Preparation Service. . . 15

Setup Keystore and Truststore Files. . . 15

Chapter 3: Intelligent Data Lake Installation. . . 16

Overview of the Intelligent Data Lake Installation. . . 16

Installing Intelligent Data Lake in Console Mode. . . 16

Intelligent Data Lake Installation in Silent Mode. . . 18

Configuring the Silent Install Properties File. . . 18

(5)

Secure the Passwords in the Properties File. . . 28

Troubleshooting. . . 28

Chapter 4: After You Install Intelligent Data Lake. . . 30

After You install Overview. . . 30

Create the Application Services. . . 30

Install Python . . . 31

Enable Logging of User Activity Events. . . 31

Index. . . 33

(6)

Preface

The Intelligent Data Lake Installation and Configuration Guide contains information about the installation and setup of Intelligent Data Lake. It includes information about the Informatica domain requirements, system requirements, and the installation and configuration process. This guide assumes you have knowledge of databases and Hadoop clusters. This guide also assumes that you are familiar with your enterprise systems and network.

Informatica Resources

Informatica Network

Informatica Network hosts Informatica Global Customer Support, the Informatica Knowledge Base, and other product resources. To access Informatica Network, visit https://network.informatica.com.

As a member, you can:

Access all of your Informatica resources in one place.

Search the Knowledge Base for product resources, including documentation, FAQs, and best practices.

View product availability information.

Review your support cases.

Find your local Informatica User Group Network and collaborate with your peers. As a member, you can:

Access all of your Informatica resources in one place.

Search the Knowledge Base for product resources, including documentation, FAQs, and best practices.

View product availability information.

Find your local Informatica User Group Network and collaborate with your peers.

Informatica Knowledge Base

Use the Informatica Knowledge Base to search Informatica Network for product resources such as documentation, how-to articles, best practices, and PAMs.

To access the Knowledge Base, visit https://kb.informatica.com. If you have questions, comments, or ideas about the Knowledge Base, contact the Informatica Knowledge Base team at

(7)

Informatica Documentation

To get the latest documentation for your product, browse the Informatica Knowledge Base at

https://kb.informatica.com/_layouts/ProductDocumentation/Page/ProductDocumentSearch.aspx.

If you have questions, comments, or ideas about this documentation, contact the Informatica Documentation team through email at [email protected].

Informatica Product Availability Matrixes

Product Availability Matrixes (PAMs) indicate the versions of operating systems, databases, and other types of data sources and targets that a product release supports. If you are an Informatica Network member, you can access PAMs at

https://network.informatica.com/community/informatica-network/product-availability-matrices.

Informatica Velocity

Informatica Velocity is a collection of tips and best practices developed by Informatica Professional Services. Developed from the real-world experience of hundreds of data management projects, Informatica Velocity represents the collective knowledge of our consultants who have worked with organizations from around the world to plan, develop, deploy, and maintain successful data management solutions.

If you are an Informatica Network member, you can access Informatica Velocity resources at

http://velocity.informatica.com.

If you have questions, comments, or ideas about Informatica Velocity, contact Informatica Professional Services at [email protected].

Informatica Marketplace

The Informatica Marketplace is a forum where you can find solutions that augment, extend, or enhance your Informatica implementations. By leveraging any of the hundreds of solutions from Informatica developers and partners, you can improve your productivity and speed up time to implementation on your projects. You can access Informatica Marketplace at https://marketplace.informatica.com.

Informatica Global Customer Support

You can contact a Global Support Center by telephone or through Online Support on Informatica Network. To find your local Informatica Global Customer Support telephone number, visit the Informatica website at the following link: http://www.informatica.com/us/services-and-training/support-services/global-support-centers. If you are an Informatica Network member, you can use Online Support at http://network.informatica.com.

(8)

C

H A P T E R

1

Introduction to Intelligent Data

Lake Installation

This chapter includes the following topics:

Intelligent Data Lake Installation Overview, 8

Live Data Map Installation, 8

Intelligent Data Lake Installation, 9

Installation Process, 9

Intelligent Data Lake Installation Overview

Intelligent Data Lake is a data preparation platform that provides a way for analysts and non-technical users to discover, access, analyze, and structure data without IT support. Analysts can use the Intelligent Data Lake interface to search for the data they need and prepare the data for use in their task or project.

Intelligent Data Lake requires the Live Data Map and an Informatica domain. Intelligent Data Lake uses Live Data Map to search for data and discover data lineage and relationships. You must install Live Data Map and create and configure the Live Data Map services before you install Intelligent Data Lake. For more

information, see the Live Data Map Installation and Configuration Guide.

Complete the pre-installation tasks to prepare for the installation. You can install the Intelligent Data Lake services only on a Red Hat Enterprise Linux 6 or higher machine. You can run the installer in console or silent mode.

Live Data Map Installation

Informatica provides an installer that installs Informatica services and Live Data Map. You must install Live Data Map 2.0 on an external cluster and configure the services before you install Intelligent Data Lake.

Note: You must select the external cluster option during Informatica Live Data Map installation. Intelligent

Data Lake cannot use an internal Live Data Map Hadoop cluster.

The installer installs Live Data Map with the option to create the following services: 1. Data Integration Service

(9)

2. Model Repository Service 3. Catalog Service

4. Content Management Service must be configured if you want to use the data domain discovery feature. For more information about Live Data Map installation, see the Live Data Map Installation and Configuration

Guide.

Intelligent Data Lake Installation

Before you install Intelligent Data Lake, you must have an Informatica domain with Live Data Map and the following services:

Data Integration Service

Model Repository Service

Catalog Service

Use the Intelligent Data Lake installer to install the application. You can install Intelligent Data Lake in the following modes:

Console Mode: If you install Intelligent Data Lake in console mode, the installer files are copied to the installation directory. Only the Model Repository Service content is created and enabled. The Data Preparation Service and Intelligent Data Lake services are not created. You must create the Data Preparation Service and Intelligent Data Lake Service after installation using the Administrator tool.

Silent Mode: If you install Intelligent Data Lake in silent mode, you can choose to create the following services during installation:

-Data Preparation Service -Intelligent Data Lake Service

If you do not want to create the services during silent mode installation, you can create these services after installation using the Administrator tool.

Note: During Intelligent Data Lake installation, the files are copied to the Live Data Map installation directory.

To uninstall Intelligent Data Lake, you must uninstall Live Data Map. For more information, see the Live Data

Map Installation and Configuration Guide.

Installation Process

The installation process for Intelligent Data Lake consists of the following phases: 1. Install and configure Live Data Map

2. Create HDFS and Hive connections for the data lake 3. Verify system requirements

4. Setup the database for Data Preparation Service 5. Setup Keystore and Truststore files

6. Install Intelligent Data Lake

7. Complete any required post-install configuration

(10)

C

H A P T E R

2

Before You Install

This chapter includes the following topics:

Before You Install Overview, 10

Read the Release Notes, 10

Verify the License Key, 11

Install and Configure Live Data Map, 11

Install Big Data Management Packages, 12

Create HDFS and Hive Connections for the Data Lake, 13

Verify System Requirements, 13

Setup the Database for Data Preparation Service, 15

Setup Keystore and Truststore Files, 15

Before You Install Overview

You can install Intelligent Data Lake with Informatica services on Red Hat Enterprise Linux 6 or higher machine.

Before you start the installation on a Red Hat Enterprise Linux 6 or higher machine, set up the machine to meet the requirements to install and run Intelligent Data Lake. If the machine where you install Intelligent Data Lake is not configured correctly, the installation can fail.

Read the Release Notes

Read the Informatica Release Notes for updates to the installation process. You can also find information about known and fixed limitations for the release.

(11)

Verify the License Key

Before you install the software, verify that you have the license key available. You can get the license key in the following ways:

Installation DVD. If you receive the Informatica installation files in a DVD, the license key file is included in the Informatica License Key CD.

FTP download. If you download the Informatica installation files from the Informatica Electronic Software Download (ESD) site, the license key is in an email message from Informatica. Copy the license key file to a directory accessible to the user account that installs the product.

You can provide the license key for Intelligent Data Lake when you install Live Data Map. You cannot specify the license during Intelligent Data Lake installation.

Ensure that you have the following license options enabled:

Hive option for the Data Integration Service

Live Data Map option for the Catalog Service

Data Lake option for the Data Preparation Service and the Intelligent Data Lake Service

Contact Informatica Global Customer Support if you do not have a license key or if you have an incremental license key and you want to install Intelligent Data Lake. If you use an incremental license for Intelligent Data Lake, the serial number of the incremental license must match the serial number for an existing license object in the domain. If the serial numbers do not match, the AddLicense command fails. You can get more

information about the contents of the license key file used for installation, including serial number, version, expiration date, operating systems, and connectivity options in the installation debug log.

Install and Configure Live Data Map

Intelligent Data Lake uses Live Data Map to search for data and discover data lineage and relationships. You must install Live Data Map and create and configure the Live Data Map services before you install Intelligent Data Lake.

Before Installation

Verify the system requirements and prerequisites for Live Data Map installation.

The license details for Intelligent Data Lake can be provided when you install Live Data Map. You cannot specify the license during Intelligent Data Lake installation. Ensure that you have the following license options enabled for Intelligent Data Lake:

Hive option for the Data Integration Service

Live Data Map option for the Catalog Service

Data Lake option for the Data Preparation Service and the Intelligent Data Lake Service

(12)

During Installation

Install Live Data Map 2.0 with external cluster option.

You must select the external cluster option during Informatica Live Data Map installation. Internal cluster is not supported for Intelligent Data Lake. For more information, see the Live Data Map Installation and

Configuration Guide.

You can choose to create the following services during the Live Data Map installation or you can also create them after installation in the following order using the Administrator tool. For more information, see the Live

Data Map Administrator Guide.

Data Integration Service.

Note: If you plan to use the operating system profiles option for the Data Integration Service, ensure that

you create and associate a different Data Integration Service for Live Data Map and Intelligent Data Lake. Live Data Map does not support operating system profiles. For more information, see the Intelligent Data

Lake Administrator Guide.

Model Repository Service

Catalog Service

Content Management Service must be configured if you want to use the data domain discovery feature.

After Installation

After you install Live Data Map, complete the following tasks:

Create a Hive resource for the data lake. For more information about creating a Hive resource for

Intelligent Data Lake, see the Live Data Map Administrator Guide. You must create the Hive resource with the following specific settings for Intelligent Data Lake:

-In the General > Connection Properties > Url field: Ensure that you use the Fully Qualified Domain Name (FQDN) of the Hive server in the JDBC connection URL used to access the Hive server. -In the Metadata Load Settings > Additional Properties > Schema field: By default, all the schemas

available in the Hadoop cluster will be selected. You must select only the schemas required for Intelligent Data Lake.

-If you are using operating system profiles, the Hive user name you specify in the Hive resource must be an HDFS superuser. For more information about operating system profiles, see the Intelligent Data Lake

Administrator Guide.

To extract metadata from the Hive sources, you need to import the relevant connectors, modify the scannerDeployer.xml file, and then restart the Catalog Service. For more information, see

How To Configure Scanner Deployer for Hive in Live Data Map.

If you are using operating system profiles, create a new Data Integration Service for Intelligent Data Lake using the Administrator tool.

Install Big Data Management Packages

To install Big Data Management on Cloudera or HortonWorks, the tar.gz file includes the Big Data Management packages and binary files that you need.

Intelligent Data Lake uses the pushdown mode to run mappings for data upload and publish. For running the mappings in pushdown mode, you must install Big Data Management packages on all nodes of the Hadoop cluster. For more information, see the Big Data Management Installation and Configuration Guide.

(13)

Create HDFS and Hive Connections for the Data

Lake

If you are installing Intelligent Data Lake in console mode, you must use the Big Data Management Configuration Utility to create the HDFS and Hive connections for Intelligent Data Lake. This step can be skipped if you are installing Intelligent Data Lake in silent mode.

If you are configuring for high availability, run the Big Data Management Utility on each node where you install the Data Preparation Service and the Intelligent Data Lake Service. The Big Data Management Configuration Utility will update the Hadoop configuration files. For more information, see the Big Data

Management Installation and Configuration Guide.You must create the HDFS and Hive connections with the

following specific requirements for Intelligent Data Lake:

For the Hadoop distribution version, select either Cloudera CDH or Hortonworks HDP. -If you select Cloudera CDH, select Cloudera Manager to access files on the Hadoop cluster. -If you select Hortonworks HDP, select Apache Ambari to access files on the Hadoop cluster.

• Select the Hive on MapReduce option for running the mappings.

In the Cluster Configuration Connection type selection panel: select No to use the default Hive command Line Interface to run mappings.

Hive connection:

-Connection Details Panel: for the Metastore Execution Mode, select remote.

HDFS connection:

-Connection Details Panel: in the HDFS username field, enter the user name with superuser privileges. The user must have access to all the Intelligent Data Lake schemas.

Verify System Requirements

Verify that your planned setup meets the minimum system requirements for the Intelligent Data Lake installation process, temporary disk space, databases, and application service hardware.

For more information about product requirements and supported platforms, see the Product Availability Matrix on the Informatica Customer Portal: https://mysupport.informatica.com/community/my-support/product-availability-matrices.

Verify Temporary Disk Space Requirements

The installer writes temporary files to the hard disk. Verify that you have enough available disk space on the machine to support the installation. When the installation completes, the installer deletes the temporary files and releases the disk space.

The installer requires 2 GB of temporary disk space.

(14)

Verify Database Requirements

Verify that the database server has adequate disk space for the databases required by the Data Preparation Service.

The following table describes the database requirements for the data preparation repository database:

Database Requirements

Data Preparation Repository Database

Create the Data Preparation Repository database on a MySQL database.

Allow 5 GB of disk space for the database. Allocate more space based on the amount of metadata you want to store.

Verify Services Installation Requirements

Verify that your machine meets the minimum system requirements to install the Intelligent Data Lake services.

The following table lists the processor, minimum memory and disk space required to install the Intelligent Data Lake services.

Services RAM Disk Space Disk space for local storage

Data Preparation Service

Approximately 512 MB per user and 4 GB for the server. Allocate more space as required.

60 GB 10 GB

Verify Hardware Requirements

Verify that your machine meets the minimum hardware requirements to install the Intelligent Data Lake services.

The following table lists the hardware required to install the Intelligent Data Lake services.

Services Processor

Data Preparation Service 2 CPUs with a minimum of 4 cores Recommended: 2 CPUs with 8 cores

(15)

Setup the Database for Data Preparation Service

Set up the MySQL server database that the Data Preparation Service connects to. The MySQL database must meet the following requirements:

You can use MySQL version 5.6.26 or higher.

-For MySQL version 5.6.26 and higher, set lower_case_table_names=1. -For MySQL version 5.7 and higher, set explicit_defaults_for_timestamp=1.

The database must have the following permissions: -Create tables and views.

-Drop tables and views. -Insert, update and delete data.

Setup Keystore and Truststore Files

When you install Live Data Map, you can configure secure communication for the domain and specify the location of the keystore files for the security certificates. The domain must have keystore and truststore files named infa_keystore and infa_truststore in PEM and JKS format.

For more information see the Live Data Map Installation and Configuration Guide.

If the domain is secure, you must secure the services that you create in Intelligent Data Lake.

The following services in the domain and the YARN application must share the same common truststore file:

-Data Integration Service -Model Repository Service -Catalog Service

-Data Preparation Service -Intelligent Data Lake Service

The Data Preparation Service and Intelligent Data Lake Service must also share the same keystore file.

You can use different keystore files for the Data Integration Service, Model Repository Service, and Catalog Service. If you use different keystore files, you must add all the keystore files to the common truststore file.

(16)

C

H A P T E R

3

Intelligent Data Lake Installation

This chapter includes the following topics:

Overview of the Intelligent Data Lake Installation, 16

Installing Intelligent Data Lake in Console Mode, 16

Intelligent Data Lake Installation in Silent Mode, 18

Troubleshooting, 28

Overview of the Intelligent Data Lake Installation

You can install the Intelligent Data Lake application on a Red Hat Enterprise Linux 6 or higher machine in console or silent mode.

Complete the pre-installation tasks to prepare for the installation. You can install the Intelligent Data Lake services on multiple machines. You must configure and enable the Data Preparation Service before you create the Intelligent Data Lake Service.

If you install Intelligent Data Lake in console mode, the services are not created by the installer. You must create the services after installation using the Administrator tool. If you install Intelligent Data Lake in silent mode, you can choose to create the services during installation or you can create them after the silent installation is complete.

Note: It is recommended to use the silent mode installation for Intelligent Data Lake.

Installing Intelligent Data Lake in Console Mode

You can install the Intelligent Data Lake services in console mode on Red Hat Enterprise Linux 6 or higher. When you run the installer in console mode, the words Quit and Back are reserved words. Do not use them as input text.

1. Log in to the machine with a system user account. 2. Close all other applications.

3. On a shell command line, run the install.sh file from the directory where you have extracted the installer files.

(17)

4. Press Enter to continue.

The installer displays the message about the prerequisites and pre-installation tasks.

5. If the prerequisites or pre-installation tasks are not completed, press n to exit the installer and set them as required.

If the prerequisites and pre-installation tasks are completed, press y to continue. 6. Press Enter to continue.

7. Type the path for the Live Data Map installation directory.

The directory names in the path must not contain spaces or the following special characters: @|* $ # ! % ( ) { } [ ] , ; '. Intelligent Data Lake and Live Data Map must be installed in the same directory.

8. Review the pre-installation summary, and press Enter to continue.

The installer copies the Intelligent Data Lake files to the installation directory.

9. To install Intelligent Data Lake services on the master gateway node, press 2. To install Informatica Intelligent Data Lake services on any other node, press 1.

10. Specify the domain and node details.

The following describes the domain and node parameters you must set:

Property Description

Domain Name Name of the domain created during Live Data Map installation.

Node Name Name of the node where you want to install Intelligent Data Lake. If you are installing Intelligent Data Lake for the first time, you must install it on the master gateway node. The subsequent installation can be on any node.

Domain User Name User name for the domain administrator.

Domain User Password

Password for the domain administrator. The password must be more than 2 characters and must not exceed 16 characters.

11. Specify the details for the services associated with Intelligent Data Lake. The following table describes the service parameters you must set:

Property Description

Model Repository Service Name

Name of the Model Repository Service associated with the Intelligent Data Lake Service.

Note: You must enter the name of the Model Repository Service configured for Live Data

Map. If you do not enter the correct name of the Model Repository Service, the Model repository content will not be updated and the Model Repository Service cannot be enabled. The installer will display an error message with the Ok and Continue options. You must press Continue to exit the installer and complete the steps described in

“Troubleshooting” on page 28.

The Post-installation Summary indicates whether the installation completed successfully. It also shows the status of the installed components and their configuration. You can view the installation log files to get more information about the tasks performed by the installer and to view configuration properties for the installed components.

(18)

Intelligent Data Lake Installation in Silent Mode

To install the Intelligent Data Lake services without user interaction, install in silent mode. Use a properties file to specify the installation options. The installer reads the file to determine the installation options. You can use silent mode installation to install Intelligent Data Lake on multiple machines on the network or to

standardize the installation across machines.

Copy the Intelligent Data Lake installation files to the hard disk on the machine where you plan to install Intelligent Data Lake. If you install on a remote machine, verify that you can access and create files on the remote machine.

To install in silent mode, complete the following tasks:

1. Configure the installation properties file and specify the installation options in the properties file. 2. Run the installer with the installation properties file.

3. Secure the passwords in the installation properties file.

Configuring the Silent Install Properties File

Informatica provides a sample properties file that includes the parameters that are required by the Intelligent Data Lake installer. You can customize the sample properties file to specify the options for your installation. Then, run the silent installation.

The sample SilentInput.properties file is stored in the root directory of the DVD or the installer download location. After you customize the file, save the file again with the file name SilentInput.properties. 1. Go to the root of the directory that contains the installation files.

2. Locate the sample SilentInput.properties file.

3. Create a backup copy of the SilentInput.properties file.

4. Use a text editor to open the file and modify the values of the installation parameters. The following table describes the installation parameters that you can modify:

Property Name Description

USER_INSTALL_DIR Directory in which to install Intelligent Data Lake. Intelligent Data Lake and Live Data Map must be installed in the same directory. Set USER_INSTALL_DIR to the <Location of the Live Data Map installation directory> and ensure that the directory has write permissions. Default is: home/Informatica/10.1.0.

DOMAIN_USER User name for the domain administrator.

- The name is not case sensitive and cannot exceed 128 characters. - The name cannot include a tab, newline character, or the following

special characters: % * + \ / ' . ? ; < >

- The name can include an ASCII space character except for the first and last character. Other space characters are not allowed.

DOMAIN_PSSWD Password for the domain administrator. The password must be more than 2 characters but cannot exceed 16 characters.

(19)

Property Name Description

FIRST_GATEWAY_NODE Indicates whether the installation is on the master gateway node. If the value is 1, the services are installed on the master gateway node of the Live Data Map domain.

If the value is 0, the services are installed on any other node. If the machine on which you are installing Intelligent Data Lake has 16 GB of RAM or less, Informatica recommends that you create the Data Preparation Service and the Intelligent Data Lake Service on different nodes. If you create the services on different nodes, you must create the Data Preparation Service on the master gateway node. You can create the Intelligent Data Lake Service on any other node.

Default is 1.

CREATE_LAKE_SERVICES Enables creation of the Data Preparation Service and Intelligent Data Lake during installation.

Set the value to 1 to enable service creation during installation. If the value is 0, the Data Preparation Service and the Intelligent Data Lake Service are not created during installation and you must create the services from the Administrator tool.

Default is 1.

DATA_PREP_SERVICE Enables creation of the Data Preparation Service during installation.

Set the value to 1 to enable the Data Preparation Service creation during installation. If the value is 0, the Data Preparation Service is not created during installation and you must be create the service from the Administrator tool.

Default is 0.

DATA_LAKE_SERVICE Enables creation of the Intelligent Data Lake Service during installation. The Intelligent Data Lake Service must be associated with a Data Preparation Service.

Set the value to 1 to enable the Intelligent Data Lake Service creation during installation.

If the value is 0, the Intelligent Data Lake Service is not created during installation and you must be create the service from the Administrator tool.

Default is 0.

BOTH_LAKEANDPREP_SERVICE Enables creation of the Data Preparation Service and Intelligent Data Lake during installation.

Set the value to 1 to enable service creation during installation. If the value is 0, the Data Preparation Service and the Intelligent Data Lake Service are not created during installation and you must create the services from the Administrator tool.

Default is 1.

(20)

Property Name Description

CREATE_CONNECTION Indicates whether the installer creates the HDFS connection and Hive connection for the Hadoop cluster.

If the value is 1, the HDFS and Hive connections are created during installation. Select this option if you want to configure Hadoop configuration files and create Hive and HDFS connections for the Data Integration Service or Data Preparation Service on this node. If you are configuring a high availability Hadoop cluster, you must update the Hadoop configuration files (for example: core-site.xml and hive-site.xml) on all nodes where the Data Preparation Service is running.

If the value is 0, the HDFS and Hive connections are not created during installation and you must create the connections from the Administrator tool.

Note: You must create the create the HDFS and Hive connections

from the Administrator tool and set the values for the

HDFS_CONNECTION_NAME and HIVE_CONNECTION_NAME properties.

Default is 1.

CLOUDERA_SELECTION Hadoop distribution for the data lake.

To use Cloudera as the Hadoop distribution, set CLOUDERA_SELECTION to 1.

To use HortonWorks as the Hadoop distribution, set CLOUDERA_SELECTION to 0.

Default is 1.

CLOUDERA_HOSTNAME Required if CREATE_CONNECTION=1 and the CLOUDERA_SELECTION=1.

Host where the Cloudera Manager runs.

CLOUDERA_USER_NAME Required if CREATE_CONNECTION=1 and the CLOUDERA_SELECTION=1.

User name for the Cloudera Manager.

CLOUDERA_USER_PASSWD Required if CREATE_CONNECTION=1 and the CLOUDERA_SELECTION=1.

Password for the Cloudera Manager user.

CLOUDERA_PORT Required if CREATE_CONNECTION=1 and the CLOUDERA_SELECTION=1.

Port for the Cloudera Manager.

HORTONWORKS_SELECTION Hadoop distribution for the data lake.

To use HortonWorks as the Hadoop distribution, set HORTONWORKS_SELECTION to 1.

To use Cloudera as the Hadoop distribution, set HORTONWORKS_SELECTION to 0.

(21)

Property Name Description

AMBARI_HOSTNAME Required if CREATE_CONNECTION=1 and the HORTONWORKS_SELECTION=1.

Host where Apache Ambari server runs.

AMBARI_USER_NAME Required if CREATE_CONNECTION=1 and the HORTONWORKS_SELECTION=1.

User name for the Apache Ambari server.

AMBARI_USER_PASSWD Required if CREATE_CONNECTION=1 and the HORTONWORKS_SELECTION=1.

Password for the Apache Ambari server user.

AMBARI_PORT Required if CREATE_CONNECTION=1 and the HORTONWORKS_SELECTION=1.

Web port for the Apache Ambari server.

UPDATE_DIS Required if CREATE_CONNECTION=1.

The Data Integration Service must be present in the Informatica domain. If the cluster is enabled for Kerberos authentication, then copy the krb5.conf file to the{INFA_HOME}/services/ shared/security folder on the machine where the Data Integration Service is configured.

If the value is 1, the Data Integration Service properties are updated and the service is automatically restarted. If

CREATE_CONNECTION=1 and the connection already exists, the Data Integration Service will not be updated.

Default is 0.

CLUSTER_INSTALLATION_DIR Required if UPDATE_DIS=1.

Indicates the Red Hat Package Manager (RPM) installation directory for the Hadoop cluster. Default is /opt/Informatica. DIS_SERVICE_NAME_CONNECTION Name of the Data Integration Service associated with Live Data

Map.

SAMPLE_HIVE_CONNECTION Required if CREATE_CONNECTION=1.

Name of the Hive connection. If you do not set the

SAMPLE_HIVE_CONNECTION property, the installer uses the default name "HIVE".

HIVE_IMPERSONATION_USER Required if CREATE_CONNECTION=1.

Name of the Hadoop Impersonation user used by Intelligent Data Lake. This user must exist in the Hadoop Cluster.

HDFS_USER_NAME User name with permissions to access the HBase database.

SAMPLE_HDFS_CONNECTION Required if CREATE_CONNECTION=1.

Name of the HDFS connection. If you do not set the

SAMPLE_HDFS_CONNECTION property, the installer uses the default name "HDFS".

(22)

Property Name Description

HIVESERVER2_PRINCIPAL Required if CREATE_CONNECTION=1 and the Hadoop cluster for the Data Integration Service uses Kerberos authentication.

KERBEROS_PRINCIPAL_NAME Required if UPDATE_DIS=1 and the Hadoop cluster uses Kerberos authentication.

Service Principal Name (SPN) for the data preparation Hadoop cluster. Specify the service principal name in the following format: user@REALM.

KERBEROS_KEYTAB Required if UPDATE_DIS=1 and the Hadoop cluster uses Kerberos authentication.

File name of the SPN keytab file for the user account to impersonate when connecting to the Hadoop cluster.

CATALOGUE_SERVICE_NAME Name of the Catalog Service associated with the Intelligent Data Lake Service.

Default is Catalog_Service.

MRS_SERVICE_NAME Name of the Model Repository Service associated with the Intelligent Data Lake Service.

Default is Model_Repository_Service.

DIS_SERVICE_NAME Name of the Data Integration Service associated with the Intelligent Data Lake Service.

If you plan to use the Operating System profiles option for the Data Integration Service, ensure that you create and associate a different Data Integration Service for Live Data Map and Intelligent Data Lake. Live Data Map does not support Operating System profiles.

Default is Data_Integration_Service.

ENABLE_CMS_SERVICE Required if you want to use the data domain discovery feature in Live Data Map. Name of the Content Management Service. If the value is true, the Content Management Service is enabled. If the value is false, the Content Management Service is disabled. Default is false.

CMS_SERVICE_NAME Name of the Content Management Service associated with the Intelligent Data Lake Service.

DPS_DB_HOST Required if DATA_PREP_SERVICE=1 or BOTH_LAKEANDPREP_SERVICE=1.

Host name of the machine that hosts the Data Preparation repository database.

DPS_DB_USER Required if DATA_PREP_SERVICE=1 or BOTH_LAKEANDPREP_SERVICE=1.

Database user account to use to connect to the Data Preparation repository.

(23)

Property Name Description

DPS_DB_PSSWD Required if DATA_PREP_SERVICE=1 or BOTH_LAKEANDPREP_SERVICE=1.

Password for the Data Preparation repository database user account.

DPS_DB_PORT Required if DATA_PREP_SERVICE=1 or BOTH_LAKEANDPREP_SERVICE=1. Port number for the database. Default is 3306.

DPS_DB_SCHEMA Required if DATA_PREP_SERVICE=1 or BOTH_LAKEANDPREP_SERVICE=1.

Schema or database name of the Data Preparation repository database.

DPS_SERVICE_NAME Required if DATA_PREP_SERVICE=1 or BOTH_LAKEANDPREP_SERVICE=1.

Name of the Data Preparation Service associated with the Intelligent Data Lake Service. Default is

Data_Preparation_Service.

ENABLE_DPS_SERVICE Required if DATA_PREP_SERVICE=1 or BOTH_LAKEANDPREP_SERVICE=1.

If the value is true, the Data Preparation Service is enabled immediately after creation.

If the value is false, you must enable the Data Preparation Service from the Administrator tool. Default is false.

DPS_NODE_NAME Required if DATA_PREP_SERVICE=1 or BOTH_LAKEANDPREP_SERVICE=1.

Name of the node where you want to run the Data Preparation Service.

DPS_LICENSE_SERVICE_NAME Required if DATA_PREP_SERVICE=1 or BOTH_LAKEANDPREP_SERVICE=1.

License object with the data lake option that allows use of the Data Preparation Service.

DPS_PROTOCOL_TYPE Required if DATA_PREP_SERVICE=1 or BOTH_LAKEANDPREP_SERVICE=1.

To enable secure communication for the Data Preparation Service, set DPS_PROTOCOL_TYPE to https.

To disable secure communication for the Data Preparation Service, set DPS_PROTOCOL_TYPE to http.

DPS_HTTP_PORT Required if DPS_PROTOCOL_TYPE=http. If

DPS_PROTOCOL_TYPE=https, ensure this field is blank. HTTP port number.

DPS_HTTPS_PORT Required if DPS_PROTOCOL_TYPE=https. If DPS_PROTOCOL_TYPE=http, ensure this field is blank. HTTPS port number.

(24)

Property Name Description

DPS_CUSTOM_HTTPS_ENABLED Required if DATA_PREP_SERVICE=1 or BOTH_LAKEANDPREP_SERVICE=1.

Indicates whether the Data Preparation Service uses the default or custom SSL certificate files.

To use the default Informatica SSL certificate files, set DPS_CUSTOM_HTTPS_ENABLED to false.

To use the custom SSL certificate files, set DPS_CUSTOM_HTTPS_ENABLED to true. Default is false.

DPS_KEYSTORE_DIR Required if DPS_CUSTOM_HTTPS_ENABLED=true. Path and the file name of keystore file that contains key and certificates required for the HTTPS communication. Required if DPS_CUSTOM_HTTPS_ENABLED is set to true.

DPS_KEYSTORE_PSSWD Required if DPS_CUSTOM_HTTPS_ENABLED=true. Password for the keystore file.

DPS_TRUSTSTORE_DIR Required if DPS_CUSTOM_HTTPS_ENABLED=true. Path and the file name of truststore file that contains authentication certificates for the HTTPS connection.

DPS_TRUSTSTORE_PSSWD Required if DPS_CUSTOM_HTTPS_ENABLED=true. Password for the truststore file.

HDFS_LOCATION HDFS location for data preparation file storage. If the connection to the local storage fails, the Data Preparation Service recovers data preparation files from the HDFS location.

LOCAL_STORAGE_DIR Directory for data preparation file storage on the node on which the Data Preparation Service runs.

SOLR_PORT Port number for the Apache Solr server used to provide data preparation recommendations. Default is 8983.

AUTH_MODE Set the Hadoop Authentication Mode to NonSecure or Kerberos.

HADOOP_IMPERSONATION_USER Required if AUTH_MODE=Kerberos.

User name to use in Hadoop impersonation as set in core-site.xml.

HDFS_PRINCIPAL_NAME Required if AUTH_MODE=Kerberos.

Service Principal Name (SPN) for the data preparation Hadoop cluster. Specify the service principal name in the following format: user/_HOST@REALM.

KERBEROS_KEYTAB_FILE Required if AUTH_MODE=Kerberos.

Path and file name of the SPN keytab file for the user account to impersonate when connecting to the Hadoop cluster. The keytab file must be in a directory on the machine where the Data Preparation Service runs.

(25)

Property Name Description

HADOOP_DISTRIBUTION Required if AUTH_MODE=Kerberos. Set HADOOP_DISTRIBUTION=Cloudera or

HADOOP_DISTRIBUTION=HortonWorks to select the Hadoop distribution that you want to configure.

Default is Cloudera.

HDFS_CONNECTION_NAME Required if AUTH_MODE=Kerberos.

HDFS connection for data preparation file storage.

IDL_SERVICE_NAME Required if DATA_LAKE_SERVICE or BOTH_LAKEANDPREP_SERVICE=1. Intelligent Data Lake Service Name. Default is Intelligent_Data_Lake_Service.

IDL_NODE_NAME Required if DATA_LAKE_SERVICE or BOTH_LAKEANDPREP_SERVICE=1.

Name of the node where you want to run the Intelligent Data Lake Service.

IDL_DPS_SERVICE_NAME Required if DATA_LAKE_SERVICE or BOTH_LAKEANDPREP_SERVICE=1. Name of the Data Preparation Service. Set the

IDL_DPS_SERVICE_NAME property to the name of the Data Preparation Service to associate with the Intelligent Data Lake Service specified in the IDL_SERVICE_NAME property.

IDL_LICENSE_SERVICE_NAME Required if DATA_LAKE_SERVICE or BOTH_LAKEANDPREP_SERVICE=1.

License object with the data lake option that allows use of the Intelligent Data Lake Service.

ENABLE_IDL_SERVICE Required if DATA_LAKE_SERVICE or BOTH_LAKEANDPREP_SERVICE=1.

If the value is true, the Intelligent Data Lake Service is enabled immediately after creation.

If the value is false, you must enable the Intelligent Data Lake Service from the Administrator tool.

Default is false.

IDL_PROTOCOL_TYPE Required if DATA_LAKE_SERVICE=1 or BOTH_LAKEANDPREP_SERVICE=1.

Indicates whether the Intelligent Data Lake Services uses secure communication.

To enable secure communication for the Intelligent Data Lake Service, set IDL_PROTOCOL_TYPE to https.

To disable secure communication for the Data Preparation Service, set IDL_PROTOCOL_TYPE to http.

IDL_HTTP_PORT Required if IDL_PROTOCOL_TYPE=http. If

IDL_PROTOCOL_TYPE=https, ensure this field is blank. HTTP port number.

(26)

Property Name Description

IDL_HTPPS_PORT Required if IDL_PROTOCOL_TYPE=https. If

IDL_PROTOCOL_TYPE=http, ensure this field is blank. HTTPS port number.

IDL_CUSTOM_HTTPS_ENABLED Required if IDL_CUSTOM_HTTPS_ENABLED=true. Indicates whether the Intelligent Data Lake Service uses the default or custom SSL certificate files.

To use the default Informatica SSL certificate files, set IDL_CUSTOM_HTTPS_ENABLED to false.

To use the custom SSL certificate files, set IDL_CUSTOM_HTTPS_ENABLED to true. Default is false.

IDL_KEYSTORE_DIR Required if IDL_CUSTOM_HTTPS_ENABLED=true. Path and the file name of keystore file that contains key and certificates required for the HTTPS communication.

IDL_KEYSTORE_PSSWD Required if IDL_CUSTOM_HTTPS_ENABLED=true. Password for the keystore file.

IDL_TRUSTSTORE_DIR Required if IDL_CUSTOM_HTTPS_ENABLED=true. Path and the file name of truststore file that contains authentication certificates for the HTTPS connection.

IDL_TRUSTSTORE_PSSWD Required if IDL_CUSTOM_HTTPS_ENABLED=true. Password for the truststore file.

LAKE_RESOURCE_NAME Hive resource for the data lake. You configure the resource in Live Data Map Administrator.

HDFS_SYSTEM_DIR HDFS directory where the Intelligent Data Lake Service copies temporary data and files necessary for the service to run.

HADOOP_DIISTRIBUTION_DIR Directory that contains Hadoop distribution files on the machine where Intelligent Data Lake Service runs. The directory must be within the Informatica directory. The default directory is <Informatica installation directory>/services/ shared/hadoop/<hadoop distribution name>.

HIVE_CONNECTION_NAME Hive connection for the data lake.

If CREATE_CONNECTION=1, you must enter the same value as the SAMPLE_HIVE_CONNECTION value for this field.

If CREATE_CONNECTION=0, you must create a Hive connection and enter the value for this field.

HIVE_LOCALSTORAGE_FORMAT Data storage format for the Hive tables. Values are DefaultFormat, Parquet, ORC. Default is DefaultFormat.

ENABLE_AUDIT_OPTIONS Indicates whether you can log user activity events.

To enable event logging, set ENABLE_AUDIT_OPTIONS to true. Default is false.

(27)

Property Name Description

ZOOKEEPER_QUORUM Required if ENABLE_AUDIT_OPTIONS=true.

List of host names and port numbers of the ZooKeeper Quorum used to log events. Specify the host names as comma-separated key value pairs. For example: <hostname1>,<hostname2>.

ZOOKEEPER_CLIENT_PORT Required if ENABLE_AUDIT_OPTIONS=true.

Port number on which the ZooKeeper server listens for client connections. Default value is 2181.

ZOOKEEPER_PARENT_ZNODE Required if ENABLE_AUDIT_OPTIONS=true.

Name of the ZooKeeper znode where the Intelligent Data Lake configuration details are stored.

SECURITY_MODE Indicates whether the security mode is NonSecure or Kerberos. Set the SECURITY_MODE to NonSecure or Kerberos. Default is NonSecure.

HDFS_KERBEROS_PRINCIPAL Required if SECURITY_MODE=Kerberos.

Service principal name (SPN) of the data lake Hadoop cluster.

IDL_KERBEROS_PRINCIPAL Required if SECURITY_MODE=Kerberos.

Service principal name (SPN) of the user account to impersonate when connecting to the data lake Hadoop cluster. The user account for impersonation must be set in the <Informatica installation directory>/services/shared/hadoop/<hadoop distribution name>/conf/core-site.xml file.

IDL_KERBEROS_KEYTAB_FILE Required if SECURITY_MODE=Kerberos.

Path and file name of the SPN keytab file for the user account to impersonate when connecting to the Hadoop cluster. The keytab file must be in a directory on the machine where the Intelligent Data Lake Service runs.

HBASE_MASTER_PRINCIPAL Required if SECURITY_MODE=Kerberos.

Service principal name (SPN) of the HBase Master Service. Use the hbase.master.kerberos.principal key value set in this file: /etc/ hbase/conf/hbase-site.xml. You must replace the _HOST parameter with the actual host name of the server where HBase Master is running.

HBASE_REGION_PRINCIPAL Required if SECURITY_MODE=Kerberos.

Service principal name (SPN) of the HBase Region Server service. Use the hbase.regionserver.kerberos.principal key value set in this file: /etc/hbase/conf/hbase-site.xml. You must replace the _HOST parameter with the actual host name of the server where HBase Region Server is running.

(28)

Property Name Description

HBASE_USER Required if ENABLE_AUDIT_OPTIONS= true and SECURITY_MODE=Kerberos.

User name with permissions to access the HBase database.

HBASE_SCHEMA Required if ENABLE_AUDIT_OPTIONS= true.

Namespace for the HBase tables. The default value is default.

5. Save the properties file with the name SilentInput.properties.

Running the Silent Installer

After you configure the properties file, open a command prompt to start the silent installation. 1. Open a Red Hat Enterprise Linux 6 or higher shell.

2. Go to the root of the directory that contains the installation files.

3. Verify that the directory contains the file SilentInput.properties that you edited and resaved. 4. Run silentInstall.sh to start the silent installation.

The silent installer runs in the background. The process can take a while. The silent installation is complete when the Informatica_<Version>_Services_InstallLog.log file is created in the installation directory.

The silent installation fails if you incorrectly configure the properties file or if the installation directory is not accessible. View the installation log files and correct the errors. Then, run the silent installation again.

Secure the Passwords in the Properties File

After you run the silent installer, ensure that passwords in the properties file are kept secure.

When you configure the properties file for a silent installation, you enter passwords in plain text. After you run the silent installer, use one of the following methods to secure the passwords:

Remove the passwords from the properties file.

Delete the properties file.

Store the properties file in a secure location.

Troubleshooting

For Intelligent Data Lake installation in console mode, you must enter the name of the Model Repository Service configured for Live Data Map. If you do not enter the correct name of the Model Repository Service, the Model repository content will not be updated and the Model Repository Service cannot be enabled. The installer will display an error message with the Ok and Continue options. You must press Continue to exit the installer and complete the following tasks using the Administrator tool or command line options: Administrator tool:

To upgrade the Model Repository Service, select the service in the Navigator, and then click Actions > Repository Contents > Upgrade.

(29)

To enable the Model Repository Service, select the service in the Navigator, and then click Actions > Enable Service.

Command Line:

To upgrade the Model Repository Service: <LDM Installation>/isp/bin/infacmd.sh mrs

upgradecontents <Domain Name> -un < domain Username> -pd <Domain Password> -sn < Valid MRS Name> .

To enable the Model Repository Service: <LDM Installation>/isp/bin/infacmd.sh enableService

-dn <Domain Name> -un < domain Username> -pd <Domain Password> -sn < Valid MRS Name>.

(30)

C

H A P T E R

4

After You Install Intelligent Data

Lake

This chapter includes the following topics:

After You install Overview, 30

Create the Application Services, 30

Install Python , 31

Enable Logging of User Activity Events, 31

After You install Overview

After you install Intelligent Data Lake, complete the post-installation tasks.

Create the Application Services

Use the Informatica Administrator tool to create the Intelligent Data Lake services.

This task is required if you have installed Intelligent Data Lake in console mode or if you have installed Intelligent Data Lake in silent mode without creating the services during installation. Ensure that you complete all the prerequisite tasks before you create the Intelligent Data Lake services. You must create the Intelligent Data Lake Services in the following order:

Data Preparation Service

Intelligent Data Lake Service

Note: If you plan to use the operating system profiles option for the Data Integration Service, ensure that you

create and associate a different Data Integration Service for Live Data Map and Intelligent Data Lake. Live Data Map does not support operating system profiles.

(31)

Install Python

Intelligent Data Lake uses the Apache Solr indexing capabilities to provide recommendations of related data assets. Apache Solr requires Python modules. You must install Python with the following modules in the node where the Data Preparation Service is configured:

argparse sys getopt os urllib httplib2 ConfigParser

Enable Logging of User Activity Events

You can audit user activity on the data lake by viewing the events that the Intelligent Data Lake Service writes to HBase. The events include user activity in the Intelligent Data Lake application, such as when a user creates a project, adds data assets to a project, or publishes prepared data.

Ensure that you have created a HBase instance in the Hadoop cluster where the data lake is configured. Follow these steps to enable logging of user activity events:

1. Log in to the Administrator tool.

2. In the Domain Navigator, select the Intelligent Data Lake Service.

If the Hadoop cluster uses Kerberos authentication, complete step 3. If the Hadoop cluster does not use Kerberos authentication, skip to step 4.

3. Edit the data lake security options. In the Edit Data Lake Security Options window, enter the following details:

Property Description

HBase Master Service Principal Name

Service principal name (SPN) of the HBase Master Service. Use the value set in this file: /etc/hbase/conf/hbase-site.xml.

HBase RegionServer Service Principal Name

Service principal name (SPN) of the HBase Region Server service. Use the value set in this file: /etc/hbase/conf/hbase-site.xml.

HBase User Name User name with Create, Read, and Write permissions to access the HBase database.

(32)

4. Edit the event logging options. In the Edit Event Logging Options window, enter the following details:

Property Description

Log User Activity Events

Indicates whether the Intelligent Data Lake service logs the user activity events for auditing. The user activity logs are stored in an HBase instance.

HBase ZooKeeper Quorum

List of host names and port numbers of the ZooKeeper Quorum used to log events. Specify the host names as comma-separated values. For example: <hostname1>,<hostname2>.

HBase ZooKeeper Client Port

Port number on which the ZooKeeper server listens for client connections. Default value is 2181.

ZooKeeper Parent ZNode

Name of the ZooKeeper znode where the Intelligent Data Lake configuration details are stored.

HBase Namespace Namespace for the HBase tables.

5. Click OK.

(33)

I

n d e x

A

application services installation requirements 14

C

console mode

installing Intelligent Data Lake 16

D

database requirements installation requirements 14

disk space requirements installation requirements 13

domain configuration repository requirements 14

I

installation process 9 Installation prerequisites 10 installation requirements

application service requirements 14

database requirements 14

disk space 13

Intelligent Data Lake

installing in console mode 16

Intelligent Data Lake services installing in silent mode 18

L

license keys verifying 11

M

References

Related documents