• No results found

Cloudera Navigator Installation and User Guide

N/A
N/A
Protected

Academic year: 2021

Share "Cloudera Navigator Installation and User Guide"

Copied!
66
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)

(c) 2010-2015 Cloudera, Inc. All rights reserved.

Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service

names or slogans contained in this document are trademarks of Cloudera and its

suppliers or licensors, and may not be copied, imitated or used, in whole or in part,

without the prior written permission of Cloudera or the applicable trademark holder.

Hadoop and the Hadoop elephant logo are trademarks of the Apache Software

Foundation. All other trademarks, registered trademarks, product names and

company names or logos mentioned in this document are the property of their

respective owners. Reference to any products, services, processes or other

information, by trade name, trademark, manufacturer, supplier or otherwise does

not constitute or imply endorsement, sponsorship or recommendation thereof by

us.

Complying with all applicable copyright laws is the responsibility of the user. Without

limiting the rights under copyright, no part of this document may be reproduced,

stored in or introduced into a retrieval system, or transmitted in any form or by any

means (electronic, mechanical, photocopying, recording, or otherwise), or for any

purpose, without the express written permission of Cloudera.

Cloudera may have patents, patent applications, trademarks, copyrights, or other

intellectual property rights covering subject matter in this document. Except as

expressly provided in any written license agreement from Cloudera, the furnishing

of this document does not give you any license to these patents, trademarks

copyrights, or other intellectual property. For information about patents covering

Cloudera products, see http://tiny.cloudera.com/patents.

The information in this document is subject to change without notice. Cloudera

shall not be liable for any damages resulting from technical errors or omissions

which may be present in this document, or from use of this document.

Cloudera, Inc.

(3)

Table of Contents

Cloudera Navigator Installation and User Guide...5

Introducing Cloudera Navigator...7

Cloudera Navigator Requirements...9

Cloudera Manager Requirements...9

Cloudera Navigator Auditing Component ...9

Cloudera Navigator Metadata Component ...10

Installing and Upgrading Cloudera Navigator...13

Cloudera Navigator Auditing Component...15

Cloudera Navigator Audit Server...15

Audit Log Properties...18

Service Auditing Properties...18

Audit Events...20

Viewing Audit Events...21

Filtering Audit Events...21

Downloading Audit Events...22

Downloading HDFS Directory Access Permission Reports...24

Cloudera Navigator Metadata Component...27

Cloudera Navigator Metadata Server...27

Metadata...30

Search Properties...31

Accessing Metadata...34

Modifying Business Metadata...36

Lineage Diagrams...41

Displaying a Template Lineage Diagram...43

Displaying an Instance Lineage Diagram...45

Displaying the Template Lineage Diagram for an Instance Lineage Diagram...45

Downloading a Lineage File...45

Tables...65

(4)
(5)
(6)
(7)

Introducing Cloudera Navigator

Cloudera Navigator is a fully integrated data management tool for the Hadoop platform. Data management

capabilities are critical for enterprise customers that are in highly regulated industries and have stringent

compliance requirements.

Cloudera Navigator provides two categories of functionality:

• Auditing data access and verifying access privileges - Cloudera Navigator allows administrators to configure,

collect, and view audit events, to understand who accessed what data and how. Cloudera Navigator also

allows administrators to generate reports that list the HDFS access permissions granted to groups.

Cloudera Navigator tracks access permissions and actual accesses to all entities in HDFS, Hive, HBase, and

Cloudera Impala to help answer questions such as - who has access to which entities, which entities were

accessed by a user, when was an entity accessed and by whom, what entities were accessed using a service,

which device was used to access, and so on. Cloudera Navigator auditing supports tracking access to:

• HDFS data accessed through HDFS, Hive, HBase, Cloudera Impala services

• HBase and Impala operations

• Hive metadata

• Sentry access

• Searching metadata and visualizing lineage - Cloudera Navigator metadata management features allow

DBAs, data modelers, business analysts, and data scientists to search for, amend the properties of, and tag

data entities.

(8)
(9)

Cloudera Navigator Requirements

Cloudera Manager Requirements

This section describes the Cloudera Navigator requirements for Cloudera Manager.

Cloudera Navigator 2 is available with Cloudera Manager 5.1. For information on the requirements for installing

Cloudera Manager, see

Requirements for Cloudera Manager

.

Cloudera Navigator Auditing Component

This section describes the databases, service versions, and audited operations supported by the Cloudera

Navigator auditing component.

Supported Audit Databases

The Cloudera Navigator auditing component supports the following databases for storing audit events:

• MySQL - 5.0, 5.1, 5.5, and 5.6

• Oracle 11gR2

• PostgreSQL - 8.4, 9.1, and 9.2

Supported Service Versions and Audited Operations

• HDFS - Minimum supported version: CDH 4.0.0.

The captured operations are:

• Operations that access or modify a file's or directory's data or metadata

• Operations denied due to lack of privileges

• HBase - Minimum supported version: CDH 4.0.0.

Note:

• In CDH versions less than 4.2.0, for grant and revoke operations, the operation in log events is

ADMIN

• In simple authentication mode, if the HBase Secure RPC Engine property is

false

(the default),

the username in log events is

UNKNOWN

. To see a meaningful user name:

1. Click the HBase service.

2. Click the Configuration tab.

3. Select Service-wide > Security.

4. Set the HBase Secure RPC Engine property to

true

.

5. Save the change and restart the service.

• Hive - Minimum supported versions: CDH 4.2.0, CDH 4.4.0 for operations denied due to lack of privileges.

The captured operations are:

(10)

Note:

• Actions taken against Hive via the Hive CLI are not audited. Therefore if you have enabled

auditing you should disable the Hive CLI to prevent actions against Hive that are not audited.

• In simple authentication mode, the username in log events is the username passed in the

HiveServer2 connect command. If you do not pass a username in the connect command, the

username is log events is

anonymous

.

• Hue - Minimum supported version: CDH 4.2.0.

The captured operations are:

• Operations (except grant, revoke, and metadata access only) sent to Beeswax Server

Note: You do not directly configure the Hue service for auditing. Instead, when you configure the

Hive service for auditing, operations sent to the Hive service through Beeswax appear in the Hue

service audit log.

• Cloudera Impala - Minimum supported version: Cloudera Impala 1.2.1.

The captured operations are:

• Queries denied due to lack of privileges

• Queries that pass analysis

• Sentry

The captured operations are:

• Operations sent to the HiveServer2 and the Hive Metastore Server roles

• Add and delete roles, assign roles to groups and remove roles from groups, create and delete privileges,

grant and revoke privileges

• Operations denied due to lack of privileges

Note: You do not directly configure the Sentry service for auditing. Instead, when you configure

the Hive service for auditing, grant, revoke, and metadata operations sent to the Hive Metastore

Server appear in the Hive service audit log.

Cloudera Navigator Metadata Component

This section describes the product versions, components, and browsers supported by the Cloudera Navigator

metadata component.

Supported Product Versions

CDH 4.4.0 and higher for all components except Pig. For Pig, CDH 4.6.0 and higher. The supported components

are:

• HDFS

• Hive

• MapReduce

• Oozie

• Pig

(11)

Supported Browsers for the Metadata UI

(12)
(13)

Installing and Upgrading Cloudera Navigator

Required Role:

Cloudera Navigator is implemented as two roles in the

Cloudera Management Service

: Navigator Audit Server

and Navigator Metadata Server. You can add Cloudera Navigator roles while installing Cloudera Manager for the

first time, into an existing Cloudera Manager installation, or while upgrading an existing Cloudera Manager

installation.

Configuring a Database for the Cloudera Navigator Auditing Component

When you install Cloudera Navigator you choose the database to store audit events. You can choose either an

embedded PostgreSQL database or an external database. For information on supported databases, see

Supported

Audit Databases

on page 9. For information on setting up an external database, see

Installing and Configuring

Databases

.

Adding Cloudera Navigator in a New Cloudera Manager Installation

1. Install Cloudera Manager following the instructions in

Installing Cloudera Manager

.

2. In the first page of the Cloudera Manager installation wizard, choose one of the license options that support

Cloudera Navigator:

• Cloudera Enterprise Data Hub Edition Trial

• Cloudera Enterprise

– Flex Edition

– Data Hub Edition

3. If you have elected Cloudera Enterprise, install a license:

a. Click Upload License.

b. Click the document icon to the left of the Select a License File text field.

c. Navigate to the location of your license file, click the file, and click Open.

d. Click Upload.

Click Continue to proceed with the installation.

4. In the first page of the Add Services procedure, check the Include Cloudera Navigator checkbox.

5. If you have chosen to use an external database, provide the Cloudera Navigator Audit Server database

properties in the Database Setup page.

Adding Cloudera Navigator in an Existing Cloudera Manager Installation

1. Add and start the Cloudera Navigator roles:

Adding and Starting the Navigator Audit Server Role

on page 15

Adding and Starting the Navigator Metadata Server Role

on page 28

Adding Cloudera Navigator While Upgrading an Existing Cloudera Manager Installation

1. Upgrade Cloudera Manager following the instructions in

Upgrading Cloudera Manager

.

(14)

Deleting Cloudera Navigator Roles

1. Do one of the following:

• Select Clusters > Cloudera Management Service > Cloudera Management Service.

• On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management

Service link.

2. Click the Instances tab.

3. Check the checkboxes next to the Navigator Audit Server and Navigator Metadata Server roles.

4. If the role instance is running, select Actions for Selected > Stop and click Stop to confirm the action.

5. Select Actions for Selected > Delete. Click Delete to confirm the deletion.

Upgrading Cloudera Navigator

To upgrade Cloudera Navigator, upgrade Cloudera Manager following the instructions in

Upgrading Cloudera

Manager

.

Important:

Cloudera does not provide an upgrade path from the Navigator Metadata component in Cloudera

Navigator 1.2 to the Cloudera Navigator 2.0 release. If you are upgrading from Cloudera Navigator

1.2, you must perform a clean install of Cloudera Navigator 2.0. Therefore, if you have Cloudera

Navigator roles from a previous beta release:

1. Delete the Navigator roles.

2. Remove the contents of the

Navigator Metadata Server storage directory

.

3. Add the Navigator roles according to the process described in

Adding and Starting the Navigator

Audit Server Role

on page 15 and

Adding and Starting the Navigator Metadata Server Role

on

page 28.

(15)

Cloudera Navigator Auditing Component

The Cloudera Navigator auditing component provides data auditing and access features. The architecture of the

Cloudera Navigator auditing component is illustrated below.

The Cloudera Navigator auditing component is implemented as an add-on to Cloudera Manager; all Cloudera

Navigator auditing component functions (configuration and audit log review) are accessed through the Cloudera

Manager Admin Console.

When the Cloudera Navigator auditing component is configured, plug-ins that enable collection of audit events

are added to the HDFS, HBase, and Hive (that is, the HiveServer2 and Beeswax servers) services. The plug-ins

write the audit events to an audit log on the local filesystem. Cloudera Impala records audit events directly in

an audit log file.

The Cloudera Manager Agent monitors the audit log files and sends these events to the Navigator Audit Server.

The Cloudera Manager Agent retries any event that it fails to transmit. As there is no in-memory transient buffer

involved, once the audit events are written to the audit log file, they are guaranteed to be delivered (as long as

filesystem is available). The Cloudera Manager Agent keeps track of current audit event offset in the audit log

that it has successfully transmitted, so on any crash/restart it picks up the audit event from the last successfully

sent position and resumes. Audit logs are rotated and the Cloudera Manager Agent follows the rotation of the

log. The Agent also takes care of purging old audit logs once they have been successfully transmitted to the

Navigator Audit Server. If a plug-in fails to write audit event to audit log file, it can either drop the event or shut

down the process in which they are running (depending on the configured queue policy).

The Navigator Audit Server stores events in the Navigator Audit DB.

Cloudera Navigator Audit Server

Describes how to add and configure the Navigator Audit Server role.

Required Role:

Adding and Starting the Navigator Audit Server Role

(16)

1. Do one of the following:

• Select Clusters > Cloudera Management Service > Cloudera Management Service.

• On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management

Service link.

2. Click the Instances tab.

3. Click the Add Role Instances button. The Customize Role Assignments page displays.

4. Assign the Navigator role to a host.

a. Customize the assignment of role instances to hosts. The wizard evaluates the hardware configurations

of the hosts to determine the best hosts for each role. The wizard assigns all worker roles to the same

set of hosts to which the HDFS DataNode role is assigned. These assignments are typically acceptable,

but you can reassign role instances to hosts of your choosing, if desired.

Click a field below a role to display a dialog containing a pageable list of hosts. If you click a field containing

multiple hosts, you can also select All Hosts to assign the role to all hosts or Custom to display the

pageable hosts dialog.

The following shortcuts for specifying hostname patterns are supported:

• Range of hostnames (without the domain portion)

Matching Hosts

Range Definition

10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4

10.1.1.[1-4]

host1.company.com, host2.company.com, host3.company.com

host[1-3].company.com

host07.company.com, host08.company.com, host09.company.com,

host10.company.com

host[07-10].company.com

• IP addresses

• Rack name

Click the View By Host button for an overview of the role assignment by hostname ranges.

5. When you are satisfied with the assignments, click Continue. The Database Setup page displays.

6. Configure database settings:

a. Choose the database type:

• Leave the default setting of Use Embedded Database to have Cloudera Manager create and configure

required databases. Make a note of the auto-generated passwords.

• Select Use Custom Databases to specify external databases.

1. Enter the database host, database type, database name, username, and password for the database

that you created when you set up the database.

b. Click Test Connection to confirm that Cloudera Manager can communicate with the database using the

information you have supplied. If the test succeeds in all cases, click Continue; otherwise check and correct

the information you have provided for the database and then try the test again. (For some servers, if you

are using the embedded database, you will see a message saying the database will be created at a later

step in the installation process.) The Review Changes page displays.

7. Click Continue. The Review Changes page displays with no configuration changes.

8. Click Finish. The Instances page displays.

(17)

Configuring the Navigator Audit Server Log Directory

1. Do one of the following:

• Select Clusters > Cloudera Management Service > Cloudera Management Service.

• On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management

Service link.

2. Click the Configuration tab.

3. Click the Navigator Audit Server Default Group category.

4. Set the Navigator Audit Server Log Directory property.

5. Click Save Changes.

6. Click the Instances tab.

7. Check the checkbox next to the Navigator Audit Server role.

8. Select Actions for Selected > Restart.

Configuring the Navigator Audit Server Data Expiration Period

You can configure the number of hours of audit events to keep in the Navigator Audit Server database as follows:

1.

Required Roles

:

2. Set the Navigator Audit Server Data Expiration Period property.

3. Click Save Changes.

4. Click the Instances tab.

5. Check the checkbox next to the Navigator Audit Server role.

6. Select Actions for Selected > Restart.

Configuring the Audit Server to Mask Personally Identifiable Information

This feature is available in Cloudera Navigator 2.0.1 or later.

Personally identifiable information (PII) is information that can be used on its own or with other information to

identify or locate a single person, or to identify an individual in context. The PII masking feature allows you to

specify credit card number patterns (from major credit issuers) that are masked in audit events, in the properties

of entities displayed in lineage diagrams, and in information retrieved from the Audit Server database and the

Metadata Server persistent storage.

Note:

• Masking Social Security numbers is not supported in this release.

• Masking is not applied to audit events and lineage entities that existed before the mask was

enabled.

1. Expand the Navigator Audit Server Default Group category.

2. Click the Advanced category.

3. Configure the PII Masking Regular Expression property with a regular expression that matches the credit

card number formats to be masked. The default expression is:

(4[0-9]{12}(?:[0-9]{3})?)|(5[1-5][0-9]{14})|(3[47][0-9]{13})

|(3(?:0[0-5]|[68][0-9])[0-9]{11})|(6(?:011|5[0-9]{2})[0-9]{12})|((?:2131|1800|35\\d{3})\\d{11})

which is constructed from the following subexpressions:

• Visa -

(4[0-9]{12}(?:[0-9]{3})?)

• MasterCard -

(5[1-5][0-9]{14})

• American Express -

(3[47][0-9]{13})

(18)

• Discover -

(6(?:011|5[0-9]{2})[0-9]{12})

• JCB -

((?:2131|1800|35\\d{3})\\d{11})

If the property is left blank, PII information is not masked.

4. Click Save Changes.

Audit Log Properties

Describes auditing log properties and how to configure the log properties.

The following properties apply to the audit log file:

• Audit Log Directory - The directory in which audit event log files are written. By default, this property is not

set if Cloudera Navigator is not installed.

Note: If the value of this property is changed, and service is restarted, then the Cloudera Manager

Agent will start monitoring the new log directory for audit events. In this case it is possible that

not all events are published from the old audit log directory. To avoid loss of audit events, when

this property is changed, perform the following steps:

1. Stop the service.

2. Copy audit log files and (for Impala only) the

impalad_audit_wal

file from the old audit log

directory to the new audit log directory. This need to be done on all the nodes where Impala

daemons are running.

3. Start the service.

• Maximum Audit Log File Size - The maximum size of the audit event log file before a new file is created. The

unit of the file size is service dependent:

– HDFS, HBase, Hive - MiB

– Impala - lines (queries)

• Number of Audit Logs to Retain - Maximum number of rolled over audit logs to retain. The logs will not be

deleted if they contain audit events that have not yet been propagated to Audit Server.

Configuring Audit Logs

1. Click a supported service.

2. Click the Configuration tab.

3. Configure the log properties in the following categories:

• Impala - Impala Daemon Default Group > Logs

• HBase, HDFS, and Hive - Service-Wide > Logs

4. Edit the audit log properties.

5. Click Save Changes.

6. Restart the service.

Service Auditing Properties

Describes service auditing properties and how to configure the properties.

Each service that supports auditing configuration has the following properties:

(19)

the Audit Log Directory property is not set, the validator displays a message that says that the Audit Log

Directory property must be set to enable auditing.

• Event Filter - A set of rules that capture properties of auditable events and actions to be performed when

an event matches those properties.

• Event Tracker - A set of rules for tracking and coalescing events. This feature is used to define equivalency

between different audit events. When events match, according to a set of configurable parameters, only one

entry in the audit list is generated for all the matching events.

• Queue Policy - The action to take when the audit event queue is full. The options are Drop or Shutdown.

When a queue is full and the queue policy of the service is Shutdown, before shutting down the service, N

audits will be discarded, where N is the size of the Cloudera Navigator Audit Server queue.

Note: If the queue policy is Shutdown, the Impala service is shut down only if Impala is unable to

write to the audit log file. It is possible that an event may not appear in the audit event log due to

an error in transfer to the Cloudera Manager Agent or database. In such cases Impala will not shut

down and will keep writing to the log file. When the transfer problem is fixed the events will be

transferred to the database.

The Event Filter and Event Tracker rules for filtering and coalescing events are expressed as JSON objects. For

information on the structure of the objects, see the description on the configuration page within the Cloudera

Manager Admin Console.

The default event filter discards events generated by the internal Cloudera and Hadoop users (

cloudera-scm

,

hdfs

,

hbase

,

hive

,

mapred

,

solr

, and

dr.who

) and that affect files in the

/tmp

directory.

Configuring Service Auditing Properties

1. Click a service that supports auditing.

2. Click the Configuration tab.

3. Click the Cloudera Navigator category. The Service-Wide category displays.

4. Edit the properties.

5. Click Save Changes.

6. Restart the service.

Configuring Impala Event Logging

To control whether the Impala daemon logs events to the audit log:

1. Click the Impala service.

2. Click the Configuration tab.

3. Expand the Impala Daemon Default Group > Logs category.

4. Edit the Enable Impala Audit Event Generation checkbox setting.

5. Click Save Changes.

6. Restart the service.

Audit Logging to Syslog

The Audit Server logs all audit records into a

Log4j

logger called

auditStream

. The log messages are logged at

the TRACE level, with the attributes of the audit records. By default, the

auditStream

logger is inactive because

the logger level is set to FATAL. It is also connected to a

NullAppender

, and does not forward to other appenders

(additivity set to false).

To record the audit stream, configure the

auditStream

logger with the desired appender. For example, the

(20)

The Log4j SyslogAppender supports only UDP. An example syslog configuration would be:

$ModLoad imudp $UDPServerRun 514

# Accept everything (even DEBUG messages) local2.* /my/audit/trail.log

It is also possible to attach

other appenders

to the

auditStream

to provide other integration behaviors.

You can audit events to syslog in two formats: JSON and RSA EnVision. To configure audit logging to syslog, do

the following:

1. Do one of the following:

• Select Clusters > Cloudera Management Service > Cloudera Management Service.

• On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management

Service link.

2. Click the Configuration tab.

3. Search for Navigator Audit Server Logging Advanced Configuration Snippet.

4. Click the Value field and depending on the format type, enter:

log4j.logger.auditStream = TRACE,SYSLOG

log4j.appender.SYSLOG = org.apache.log4j.net.SyslogAppender log4j.appender.SYSLOG.SyslogHost = hostname

log4j.appender.SYSLOG.Facility = Local2 log4j.appender.SYSLOG.FacilityPrinting = true

To configure the specific stream type, enter:

Properties

Format

log4j.additivity.auditStream = false

JSON

log4j.additivity.auditStreamEnVision = false

RSA EnVision

5. Click Save Changes to commit the changes.

Example Log Messages

Log Message Example

Format

Jul 23 11:05:15 hostname local2:

{"type":"HDFS","allowed":"true","time":"1374602714758",

JSON

"service":"HDFS-1", "user":"root","ip":"10.20.93.93","op":"mkdirs","src":"/audit/root","perms":"rwxr-xr-x"} Cloudera|Navigator|1|type="Hive",allowed="false",time="1382551146763", service="HIVE-1",user="systest",impersonator="",ip="/10.20.190.185",op="QUERY",

RSA EnVision

opText="select count(*) from

sample_07",db="default",table="sample_07",path="/user/hive/warehouse/sample_07",objType="TABLE"

If a particular field is not applicable for that audit event, it is omitted from the message.

Audit Events

An audit event is an event that describes an action that has been taken for a service, role, or host.

In Cloudera Manager, audit event logs display:

• Service, role, and host life cycle events recorded by Cloudera Management Service roles. For further information,

see

Audit Events

in the Cloudera Manager Monitoring and Diagnostics Guide

(21)

Audit Event Properties

The following properties can appear in an audit event entry:

• Date - Date and time the action was performed.

• Command - The action performed.

• Source - The object affected by the service action.

• User - The name of the user that performed the action.

• Impersonator - If the action was requested by another service, the name of the user that invoked the service

action on behalf of the user.

– The Impersonator field will always show when Sentry is not enabled.

– The Impersonator field will show for other services than Hive when Sentry is enabled.

• IP Address - The IP address of the host where the service action occurred.

• Service - The name of the service that performed the service action.

• Role - The name of the role that performed the service action.

Viewing Audit Events

You can view audit events for all services or for a specific service. To view audit events, follow the appropriate

procedure:

Procedure

Object

All Services

1. Click Audits in the Cloudera Manager Admin Console top navigation bar.

Service

1. In the Cloudera Manager Admin Console, click a service that supports auditing.

2. Click the Audits tab on the service navigation bar.

Audit event entries are ordered with the most recent at the top.

Events that represent denied access are labeled Denied, red text, and a pink background.

Filtering Audit Events

You filter on generated audit events in the audit UI by selecting a time range and adding filters.

You can use the Time Range Selector or a duration link (

) to set the time range. (See Time

Line in

Cloudera Manager Monitoring and Diagnostics Guide

for details). When you select the time range, the

log displays all events in that range. Note that the time it takes to perform a search will typically increase for a

longer time range, as the number of events to be searched will be larger.

Adding a Filter

• Click the icon that displays next to a property when you hover in one of the event entries. A filter containing

the property, operator, and its value is added to the list of filters at the left and Cloudera Manager redisplays

all events that match the filter.

(22)

Choose a property in the drop-down list. You can search by properties such as Username, Service, Command,

or Role. The properties vary depending on the service or role.

1.

2. If the property allows it, choose an operator in the operator drop-down list.

3. Type a property value in the value text field. For some properties, where the list of values is finite and

known, you can start typing and then select from a list of potential matches. To match a substring, use

the

like

operator and specify

%

around the string. For example, to see all the audit events for files created

in the folder

/user/joe/out

specify

Source like %/user/joe/out%

.

4. Click Search. The log displays all events that match the filter criteria.

5. Click to add more filters and repeat steps

1

through

4

.

Removing a Filter

1. Click the at the right of the filter. The filter is removed.

2. Click Search. The log displays all events that match the filter criteria.

Downloading Audit Events

You can download audit events in the Audit UI or using the Audit API. An audit event contains the following

fields:

service

,

username

,

command

,

ipAddress

,

resource

,

allowed

,

timestamp

,

operationText

. The structure

of the

resource

field depends on the type of the service:

• HDFS - A file path.

• Hive, Hue, Impala, and Sentry - database:tablename

• HBase - table family:qualifier

For Hive, Hue, Impala, and Sentry events,

operationText

contains the operation string.

Downloading Audit Events Using the Audit UI

1. Display the audit log.

2. Specify desired filters and time range.

3. Click the Download CSV button. A file named

history.csv

is downloaded.

HDFS Audit Log Example

service,username,command,ipAddress,resource,allowed,timestamp,operationText,

HDFS,cloudera,setPermission,20.10.187.242,/user/hive,false,"2014-02-09T00:59:34.430Z", HDFS,cloudera,getfileinfo,20.10.187.242,/user/cloudera,true,"2014-02-09T00:59:22.667Z", HDFS,cloudera,getfileinfo,20.10.187.242,/,true,"2014-02-09T00:59:22.658Z",

In this example, the first event access was denied, and therefore the

allowed

field has the value

false

.

Hive and Sentry Example - via downloaded CSV file

The following records list Hive operations to create and load a table and Sentry operations to created roles and

grant privileges:

service,username,command,ipAddress,resource,allowed,timestamp,operationText

Hive,admin,LOAD,20.10.191.128,default:sample_08,true,"2014-07-08T20:21:03.510Z","LOAD DATA INPATH

'/user/admin/sample_08' OVERWRITE INTO TABLE sample_08"

Hive,admin,LOAD,20.10.191.128,:,true,"2014-07-08T20:21:03.509Z","LOAD DATA INPATH '/user/admin/sample_08' OVERWRITE INTO TABLE sample_08"

(23)

STORED AS TextFile"

Hive,hive,GRANT_ROLE,20.10.191.128,:,true,"2014-07-08T20:19:26.718Z","GRANT ROLE default_admin TO GROUP sentryDefaultAdmin"

Hive,hive,GRANT_PRIVILEGE,20.10.191.128,default:,true,"2014-07-08T20:19:26.149Z","GRANT ALL ON DATABASE default TO ROLE default_admin"

Hive,hive,CREATEROLE,20.10.191.128,:,true,"2014-07-08T20:19:25.761Z","CREATE ROLE default_admin"

Hive,hive,GRANT_ROLE,20.10.191.128,:,true,"2014-07-08T20:19:20.515Z","GRANT ROLE global_admin TO GROUP sentryAdmin"

Hive,hive,GRANT_ROLE,20.10.191.128,:,true,"2014-07-08T20:19:20.063Z","GRANT ROLE global_admin TO GROUP hive"

Hive,hive,GRANT_PRIVILEGE,20.10.191.128,server1:,true,"2014-07-08T20:19:19.382Z","GRANT ALL ON SERVER server1 TO ROLE global_admin"

Hive,hive,CREATEROLE,20.10.191.128,:,true,"2014-07-08T20:19:18.281Z","CREATE ROLE global_admin"

Downloading Audit Events Using the Audit API

You can filter and download audit events using the Cloudera Manager audit API. See

Audit API

.

Hive and Sentry Example - via audit API

To download the same audits events using the API, issue the request

http://host-1.ent.cloudera.com:7180/api/v6/audits?query=service==*HIVE*

, which returns the

following JSON items:

{ "items" : [ { "timestamp" : "2014-07-08T20:21:03.510Z", "service" : "Hive", "username" : "admin", "ipAddress" : "20.10.191.128", "command" : "LOAD", "resource" : "default:sample_08",

"operationText" : "LOAD DATA INPATH\n '/user/admin/sample_08' OVERWRITE INTO TABLE sample_08", "allowed" : true }, { "timestamp" : "2014-07-08T20:21:03.509Z", "service" : "Hive", "username" : "admin", "ipAddress" : "20.10.191.128", "command" : "LOAD", "resource" : ":",

"operationText" : "LOAD DATA INPATH\n '/user/admin/sample_08' OVERWRITE INTO TABLE sample_08", "allowed" : true }, { "timestamp" : "2014-07-08T20:21:01.899Z", "service" : "Hive", "username" : "admin", "ipAddress" : "20.10.191.128", "command" : "CREATETABLE", "resource" : "default:sample_08",

"operationText" : "CREATE TABLE `sample_08` (\n `code` string ,\n `description` string ,\n `total_emp` int ,\n `salary` int )\nROW FORMAT DELIMITED\n FIELDS TERMINATED BY '\t'\nSTORED AS TextFile",

"allowed" : true }, { "timestamp" : "2014-07-08T20:19:26.718Z", "service" : "Hive", "username" : "hive", "ipAddress" : "20.10.191.128", "command" : "GRANT_ROLE", "resource" : ":",

"operationText" : "GRANT ROLE default_admin TO GROUP sentryDefaultAdmin", "allowed" : true

}, {

"timestamp" : "2014-07-08T20:19:26.149Z", "service" : "Hive",

(24)

"ipAddress" : "20.10.191.128", "command" : "GRANT_PRIVILEGE", "resource" : "default:",

"operationText" : "GRANT ALL ON DATABASE default TO ROLE default_admin", "allowed" : true }, { "timestamp" : "2014-07-08T20:19:25.761Z", "service" : "Hive", "username" : "hive", "ipAddress" : "20.10.191.128", "command" : "CREATEROLE", "resource" : ":",

"operationText" : "CREATE ROLE default_admin", "allowed" : true }, { "timestamp" : "2014-07-08T20:19:20.515Z", "service" : "Hive", "username" : "hive", "ipAddress" : "20.10.191.128", "command" : "GRANT_ROLE", "resource" : ":",

"operationText" : "GRANT ROLE global_admin TO GROUP sentryAdmin", "allowed" : true }, { "timestamp" : "2014-07-08T20:19:20.063Z", "service" : "Hive", "username" : "hive", "ipAddress" : "20.10.191.128", "command" : "GRANT_ROLE", "resource" : ":",

"operationText" : "GRANT ROLE global_admin TO GROUP hive", "allowed" : true }, { "timestamp" : "2014-07-08T20:19:19.382Z", "service" : "Hive", "username" : "hive", "ipAddress" : "20.10.191.128", "command" : "GRANT_PRIVILEGE", "resource" : "server1:",

"operationText" : "GRANT ALL ON SERVER server1 TO ROLE global_admin", "allowed" : true }, { "timestamp" : "2014-07-08T20:19:18.281Z", "service" : "Hive", "username" : "hive", "ipAddress" : "20.10.191.128", "command" : "CREATEROLE", "resource" : ":",

"operationText" : "CREATE ROLE global_admin", "allowed" : true

} ] }

Downloading HDFS Directory Access Permission Reports

For each HDFS service you can download a report that details the HDFS directories a group has permission to

access.

1. In the Cloudera Manager Admin Console, navigate to the Reports page:

• (Cloudera Manager 4) Click Reports.

• (Cloudera Manager 5) Click Clusters > ClusterName > Other > Reports.

(25)
(26)
(27)

Cloudera Navigator Metadata Component

The Cloudera Navigator metadata component provides data discovery and data lineage management functions.

The architecture of the Cloudera Navigator metadata component is illustrated below.

The Navigator Metadata Server performs five main functions:

• Obtains connection information about the services whose data it manages from the Cloudera Manager Server

• Extracts entity metadata from the services at periodic intervals

• Indexes the metadata

• Implements a graphical UI for searching the metadata and displaying lineage relationships between entities

• Implements a REST API for retrieving metadata

Cloudera Navigator Metadata Server

Describes how the Cloudera Navigator Metadata Server extracts metadata from the entities managed by Cloudera

Manager, and how to add and configure the Navigator Metadata Server role.

Required Role:

About Metadata Extraction

The Navigator Metadata Server extracts metadata from the following resource types from the listed servers:

• HDFS - Extracts HDFS metadata at the next scheduled extraction run after an HDFS checkpoint. However,

if you have High Availability enabled, metadata is extracted as soon as it is written to the JournalNodes.

• Hive - Extracts database and table metadata from the Hive Metastore Server.

• MapReduce - Extracts job metadata from the JobTracker. The default setting in Cloudera Manager retains a

maximum of five jobs, which means if you run more than five jobs between Navigator extractions, the

Navigator Metadata Server would extract the five most recent jobs.

(28)

• Pig - Extracts Pig script runs from the JobTracker or Job History Server.

• Sqoop 1 - Extracts database and table metadata from the Hive Metastore Server.

• YARN - Extracts job metadata from the Job History Server.

If an entity is created at time t0 in the system, that entity will be extracted and linked in Navigator after the

extraction poll period (default 10 minutes) plus a service-specific interval as follows:

• HDFS: t0 + extraction poll period + HDFS checkpoint interval (default 1 hour)

• HDFS + HA: t0 + extraction poll period

• Hive: t0 + extraction poll period + Hive maximum wait time (default 60 minutes)

Adding and Starting the Navigator Metadata Server Role

1. Do one of the following:

• Select Clusters > Cloudera Management Service > Cloudera Management Service.

• On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management

Service link.

2. Click the Instances tab.

3. Click the Add Role Instances button. The Customize Role Assignments page displays.

4. Assign the Navigator role to a host.

a. Customize the assignment of role instances to hosts. The wizard evaluates the hardware configurations

of the hosts to determine the best hosts for each role. The wizard assigns all worker roles to the same

set of hosts to which the HDFS DataNode role is assigned. These assignments are typically acceptable,

but you can reassign role instances to hosts of your choosing, if desired.

Click a field below a role to display a dialog containing a pageable list of hosts. If you click a field containing

multiple hosts, you can also select All Hosts to assign the role to all hosts or Custom to display the

pageable hosts dialog.

The following shortcuts for specifying hostname patterns are supported:

• Range of hostnames (without the domain portion)

Matching Hosts

Range Definition

10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4

10.1.1.[1-4]

host1.company.com, host2.company.com, host3.company.com

host[1-3].company.com

host07.company.com, host08.company.com, host09.company.com,

host10.company.com

host[07-10].company.com

• IP addresses

• Rack name

Click the View By Host button for an overview of the role assignment by hostname ranges.

5. Click Finish. The Instances page displays.

6. Check the checkbox next to the Navigator Metadata Server role.

7. Select Actions for Selected > Start. Click Start to confirm the action.

Configuring the Navigator Metadata Server Storage Directory

Describes how to configure where the Navigator Metadata Server stores extracted data. The default is

/var/lib/cloudera-scm-navigator

.

1. Do one of the following:

(29)

• On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management

Service link.

2. Click the Configuration tab.

3. Click the Navigator Metadata Server Default Group.

4. Specify the directory in the Navigator Metadata Server Storage Dir property.

5. Click Save Changes.

6. Click the Instances tab.

7. Check the checkbox next to the Navigator Metadata Server role.

8. Select Actions for Selected > Restart.

Configuring the Navigator Metadata Server Port

Describes how to configure the port on which the Navigator Metadata UI is accessed. The default port is 7187.

1. Do one of the following:

• Select Clusters > Cloudera Management Service > Cloudera Management Service.

• On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management

Service link.

2. Click the Configuration tab.

3. Select Navigator Metadata Server Default Group > Ports and Addresses.

4. Specify the port in the Navigator Metadata Server Port property.

5. Click Save Changes.

6. Click the Instances tab.

7. Check the checkbox next to the Navigator Metadata Server role.

8. Select Actions for Selected > Restart.

Configuring the Metadata Server to Mask Personally Identifiable Information

This feature is available in Cloudera Navigator 2.0.1 or later.

Navigator Metadata Server Sizing and Performance Recommendations

Two activities determine Navigator Metadata Server resource requirements:

• Extracting metadata from the cluster and creating relationships

• Querying

The Navigator Metadata Server uses

Solr

to store, index, and query metadata. Indexing happen during extraction.

Querying is fast and efficient because the data is indexed.

Memory and CPU requirements are based on amount of data that is stored and indexed. With 6 GB of RAM and

8-10 cores Solr can process 6 million entities in 25-30 minutes or 80 million entities in 8 to 9 hours. Any less

RAM than 6GB and will result in excessive garbage collection and possibly out-of-memory exceptions. For large

clusters, Cloudera advises at least 8 GB of RAM and 8 cores. The Solr instance runs in process with Navigator,

so the Java heap for the Navigator Metadata Server should be set according to the size of cluster.

By default, during the Cloudera Manager first run installation wizard the Navigator Audit Server and Navigator

Metadata Server are assigned to the same host as the Cloudera Management Service monitoring roles. This

configuration works for a small cluster, but should be updated before the cluster grows. You can either change

the configuration at installation time or move the Navigator Metadata Server if necessary.

Moving a Navigator Metadata Server Role

(30)

Enabling Hive Metadata Extraction in a Secure Cluster

The Navigator Metadata Server uses the hue user to connect to the Hive Metastore. The hue user is able to

connect to the Hive Metastore by default. However, if the Hive service Hive Metastore Access Control and Proxy

User Groups Override property and/or the HDFS service Hive Proxy User Groups property have been changed

from their default values to settings that prevent the hue user from connecting to the Hive Metastore, Navigator

Metadata Server will be unable to extract metadata from Hive. If this is the case, modify the Hive service Hive

Metastore Access Control and Proxy User Groups Override property and/or the HDFS service Hive Proxy User

Groups property so that the hue user can connect as follows:

1. Go to the Hive or HDFS service.

2. Click the Configuration tab.

3. Expand the Service-Wide > Proxy category.

4. In the Hive service Hive Metastore Access Control and Proxy User Groups Override field or the HDFS service

Hive Proxy User Groups field, click the Value column, and click to add a new row.

5. Type

hue

.

6. Click Save Changes to commit the changes.

7. Restart the service.

Metadata

The Cloudera Navigator Metadata component manages metadata about the entities in a Hadoop cluster and

relationships between the entities.

The Cloudera Navigator metadata schema defines the types of metadata that are available for each entity type

it supports. The types of metadata defined by the Cloudera Navigator Metadata component include: the name

of an entity, the service that manages or uses the entity, type, path to the entity, date and time of creation,

access, and modification, size, owner, purpose, and relationships—parent-child, data flow, and instance

of—between entities.

For example, the following shows the property sheet of a file entity:

There are two classes of metadata:

• technical metadata - metadata defined when entities are extracted. You cannot modify technical metadata.

• business metadata - metadata

added

to extracted entities. You can add and modify business metadata

(31)

Metadata Indexing

After metadata is

extracted

it is indexed and made available for searching by an embedded

Solr

engine. The Solr

schema indexes two types of metadata: entity properties and relationships between entities.

You can search entity metadata using the Navigator Metadata UI. Relationship metadata is implicitly visible in

lineage diagrams

and explicitly available in a

lineage file

.

Search Syntax

Search in the Cloudera Navigator Metadata component is implemented by an embedded Solr engine that supports

the syntax described in

LuceneQParserPlugin

. You construct search strings by specifying the value of a

default

property

, property name-value pairs, or user-defined name-value pairs using the syntax:

• Property name-value pairs - propertyName

:

value, where

propertyName is one of the properties listed in

Search Properties

on page 31.

value is a single value or range of values specified as

[value1 TO value2]

. In a value,

*

is a wildcard.

In property name-value pairs you must escape special characters

:

,

/

, and

*

with the backslash character

\

. For example,

fileSystemPath:\/user\/admin

.

• User-defined name-value pairs -

up_propertyName:value

.

To construct complex strings, join multiple property-value pairs using the

or

and

and

operators.

Example Search Strings

• Filesystem path

/user/admin

-

fileSystemPath:\/user\/admin

• Descriptions that start with the string "Banking" -

description:Banking*

• Sources of type MapReduce or Hive -

sourceType:MAPREDUCE or sourceType:HIVE

• Directories owned by

hdfs

in the path

/user/hdfs/input

-

owner:HDFS and type:directory and fileSystemPath:\/user\/hdfs\/input

• Job started between 20:00 to 21:00 UTC -

started:[2013-10-21T20:00:00.000Z TO 2013-10-21T21:00:00.000Z]

• User-defined key-value

project-customer1

-

up_project:customer1

Note: When viewing MapReduce jobs in the Cloudera Manager Activities page, the string that appear

in a job's Name column equates to the

originalName

property. Therefore, to specify a MapReduce

job's name in a search, use the following string:

(resType:mapreduce) and

(originalName:jobName)

, where jobName is the value in the job's Name column.

Search Properties

A reference for the search schema properties.

Default Properties

The following properties can be searched simply by specifying a property value:

type

,

fileSystemPath

,

inputs

,

jobId

,

mapper

,

mimeType

,

name

,

originalName

,

outputs

,

owner

,

principal

,

reducer

, and

tags

.

Common Properties

Description

Type

Name

Description of the entity.

text

description

The group to which the owner of the entity belongs.

caseInsensitiveText

group

The overridden name of the entity. If the name has not been overridden, this

value is empty. Names cannot contain spaces.

ngramedText

(32)

Description

Type

Name

The type of an operation:

ngramedText

operationType

• Pig - SCRIPT

• Sqoop - Table Export, Query Import

The name of the entity when it was extracted.

ngramedText

originalName

The description of the entity when it was extracted.

text

originalDescription

The owner of the entity.

caseInsensitiveText

owner

For entities with type

OPERATION_EXECUTION

, the initiator of the entity.

caseInsensitiveText

principal

A set of tags that describe the entity.

ngramedText

tags

The type of the entity. The available types depend on the entity's source type:

ngramedText

type

• HDFS -

DIRECTORY

,

FILE

• HIVE -

DATABASE

,

TABLE

,

FIELD

,

OPERATION

,

OPERATION_EXECUTION

,

SUB_OPERATION

,

PARTITION

,

RESOURCE

,

UNKNOWN

,

VIEW

• MAPREDUCE -

OPERATION

,

OPERATION_EXECUTION

• OOZIE -

OPERATION

,

OPERATION_EXECUTION

• PIG -

OPERATION

,

OPERATION_EXECUTION

• SQOOP -

OPERATION

,

OPERATION_EXECUTION

,

SUB_OPERATION

• YARN -

OPERATION

,

OPERATION_EXECUTION

Query

The text of a Hive or Sqoop query.

string

queryText

Source

The name of the cluster in which the entity is stored.

string

clusterName

The ID of the source type.

string

sourceId

The source type of the entity: HDFS, HIVE, MAPREDUCE, OOZIE, PIG, SQOOP,

YARN.

caseInsensitiveText

sourceType

The URL of the source type.

string

sourceUrl

Timestamps

(33)

Description

Type

Name

-

started

,

ended

HDFS Properties

Description

Type

Name

The path to the entity.

path

fileSystemPath

Indicates whether the entity is compressed.

Boolean

compressed

Indicates whether the entity has been moved to the Trash folder.

Boolean

deleted

The time the entity was moved to the Trash folder.

date

deleteTime

The MIME type of the entity.

ngramedText

mimeType

The path to the parent entity for a child entity. For example:

parent

path:/default/sample_07

for the table

sample_07

from the Hive database

default

.

string

parentPath

The UNIX access permissions of the entity.

string

permissions

The exact size of the entity in bytes or a range of sizes. Range examples:

size:[1000 TO *]

,

size: [* TO 2000]

, and

size:[* TO *]

to find all

fields with a size value.

long

size

MAPREDUCE and YARN Properties

Description

Type

Name

Indicates whether files are searched recursively under the input directories, or

just files directly under the input directories are considered.

Boolean

inputRecursive

The ID of the job. For a job spawned by Oozie, the workflow ID.

ngramedText

jobId

The fully-qualified name of the mapper class.

string

mapper

The fully-qualified name of the class of the output key.

string

outputKey

The fully-qualified name of the class of the output value.

string

outputValue

The fully-qualified name of the reducer class.

string

reducer

OPERATION Properties

Description

Type

Name

Operation

The fully-qualified name of the class of the input format.

string

inputFormat

The fully-qualified name of the class of the output format.

string

outputFormat

Operation Execution

The name of the entity input to an operation execution. For entities of resource

type MR, it is usually a directory. For entities of resource type Hive, it is usually

a table.

string

(34)

Description

Type

Name

The name of the entity output from an operation execution. For entities of

resource type MR, it is usually a directory. For entities of resource type Hive, it

is usually a table.

string

outputs

HIVE Properties

Description

Type

Name

Field

The type of data stored in a field (column).

ngramedText

dataType

Table

Indicates whether a Hive table is compressed.

Boolean

compressed

The name of the library containing the SerDe class.

string

serDeLibName

The fully-qualified name of the SerDe class.

string

serDeName

Partition

The table columns that define the partition.

string

partitionColNames

The table column values that define the partition.

string

partitionColValues

Oozie Properties

Description

Type

Name

The status of the Oozie workflow: RUNNING, SUCCEEDED, or FAILED.

string

status

PIG Properties

Description

Type

Name

The ID of the Pig script.

string

scriptId

SQOOP Properties

Description

Type

Name

The URL of the database from or to which the data was imported or exported.

string

dbURL

The table from or to which the data was imported or exported.

string

dbTable

The database user.

string

dbUser

The where clause that identifies which rows were imported.

string

dbWhere

An expression that identifies which columns were imported.

string

dbColumnExpression

Accessing Metadata

(35)

Navigator Metadata UI

Opening the Navigator Metadata UI

The Navigator Metadata UI allows you to search entity metadata and view entity linage.

To open the Metadata UI, do one of the following:

• Directly open the URL of the Cloudera Navigator Metadata Server:

http://hostname:port/

, where hostname

is the name of the host on which you are running the

Cloudera Navigator Metadata Server

on page 27 and

port is the port configured for the Cloudera Navigator Metadata Server.

Note: The default port of the Cloudera Navigator Metadata Server is

7187

. To change the port,

follow the instructions in

Configuring the Navigator Metadata Server Port

on page 29.

• Navigate to the Navigator Metadata Web UI:

1. Do one of the following:

• Select Clusters > Cloudera Management Service > Cloudera Management Service.

• On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera

Management Service link.

2. Click the Instances tab.

3. Click the Navigator Metadata Server role.

4. Click the Navigator Web UI link.

Searching Metadata

You perform search in the Navigator Metadata UI by typing search strings or constructing search strings using

UI controls.

1. Open the Navigator Metadata UI.

2. Provide Cloudera Manager default administrator credentials and click Login.

3. Do one of the following:

• Type a search string into the Search box that conforms to the

search syntax

. The Search Results page

displays as soon as you start typing.

• Click the Query Builder link. The Query Builder landing page displays with the result of the wildcard search

(*). The Query Builder landing page displays Source Type and Type facets that match the search results

with the number of results that match each value of those properties. You can filter the search results

by clicking specific values for those properties or adding new properties.

The Full Query read-only box displays the search string constructed from the specified filters. Click Show

n Results to display the Search Results page.

Search Results

The Search Results page has a Search box and two panes: the Query Builder pane and The Search Results pane.

The Search Results pane displays the number of matching entries

in pages listing 25

entities per page. You can view the pages using the page control

at the bottom of each

(36)

in the result list contains:

• Source type

• Entity name - the name is a link to a page that displays the entity

property editor

and

lineage diagram

.

• Entity properties

Specifying Property Values in the Query Builder Pane

The Query Builder pane contains a Search box and a set of graphical controls that allow you to select property

values to filter search results. You can filter using the Search box or the graphical controls.

In the Search box, type the values of

default properties

.

To filter on a property value for non-default properties, specify values as follows:

• Boolean - Check the checkbox.

• Enumerated - Start typing or click the field and then select from a drop-down list.

• Timestamps - Specified in the format

mm/dd/yyyy hh:mm [AM|PM]

in a date control

. You choose dates in your local time zone, but the string that displays in the

Search box is UTC. In the date control:

– Date - Click the down arrow to display a calendar and select a date, or click a subfield and click the

spinner arrows

or up and down arrow keys.

Click the hour, minute, and AM or PM fields and click the spinner arrows

or up and down arrow keys

to specify the value.

– Move between fields using the right and left arrow keys.

To add a property, click Add another filter... and select a property name.

Navigator Metadata API

The Navigator Metadata API allows you to search entity metadata using a REST API. For information about the

API, see

• 2.0.1 and higher -

Cloudera Navigator v2 API documentation

• 1.2 - 2.0.0 -

Cloudera Navigator v1 API documentation

.

Modifying Business Metadata

The Cloudera Navigator Metadata component allows you to add and modify the following business metadata

associated with entities: display name, description, tags, and user-defined name-value pairs. You can modify

business metadata using the Navigator Metadata UI, MapReduce service and job properties, Navigator metadata

files, and the Navigator Metadata API.

Modifying Business Metadata Using the Navigator Metadata UI

1. Run a

search

in the Cloudera Navigator Metadata UI.

2. Click an entity link returned in the search. The metadata pane displays on the left and the lineage page

displays on the right.

(37)

4. Edit any of the fields as instructed. Press Enter or Tab to create new tag entries. For example, a Description,

the tags

occupations

and

salaries

, and property year with value 2012 have been added to the file

sample_07.csv

:

Note: You can specify special characters (for example, ".", " ") in the name, but it will make searching

for the entity more difficult as some characters collide with special characters in the

search syntax

.

(38)

Modifying MapReduce Business Metadata Using Properties

You can set MapReduce job metadata statically for all jobs, or dynamically for a specific job or job instance.

To statically set metadata for all MapReduce jobs:

1. Do one of the following:

• Select Clusters > Cloudera Management Service > Cloudera Management Service.

• On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management

Service link.

2. Click the Configuration tab.

3. Select Navigator Metadata Server Default Group > Advanced.

4. Click Navigator Metadata Server Advanced Configuration Snippet for cloudera-navigator.properties.

5. Specify values for one or more of the following properties:

nav.user_defined_properties

= comma-separated list of user-defined property names

nav.tags

= comma-separated list of property names that serve as tags

6. Click Save Changes.

7. Click the Instances tab.

8. Check the checkbox next to the Navigator Metadata Server role.

9. Select Actions for Selected > Restart.

To modify parameters dynamically, specify one or more of the following properties in a job configuration:

nav.job.user_defined_properties

= comma-separated list of user-defined property names for a job

(

type:OPERATION

)

nav.job.tags

= comma-separated list of property names that serve as tags for a job

nav.jobexec.user_defined_properties

= comma-separated list of user-defined property names for a

job execution (

type:OPERATION_EXECUTION

)

References

Related documents

With gratitude we acknowledge these Friends of the Music Series donors who give annually in support of making Saint Mark’s musical riches accessible to

See Viewing Charts for Cluster, Service, Role, and Host Instances on page 15 for detailed information on the charts that are presented, and the ability to search and display

• Operator - Allows the user to view service and monitoring information, stop, start, and restart clusters, services, and roles (except the Cloudera Management Service and

Client configuration files are generated automatically by Cloudera Manager based on the services and roles you have installed and Cloudera Manager deploys these

Choi and Kim [ 4 ] developed two simple heuristic algorithms, two hybrid algorithms, and three constructive algorithms for m-machine re-entrant flow shop, where the key idea of a

The processing times take values between 1 and 25 (inclusive). Once a set of processing times is generated, the seed of the random number function is changed in order to

This cycle starts with a perceived need and extends through feasibility study, design and development, testing, implementation, system acceptance and approval, post-

Whiteley, HRM practices in information technology management Proceedings of computer personnel research conference (SIGCPR) on Reinventing IS : managing information technology