(c) 2010-2015 Cloudera, Inc. All rights reserved.
Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service
names or slogans contained in this document are trademarks of Cloudera and its
suppliers or licensors, and may not be copied, imitated or used, in whole or in part,
without the prior written permission of Cloudera or the applicable trademark holder.
Hadoop and the Hadoop elephant logo are trademarks of the Apache Software
Foundation. All other trademarks, registered trademarks, product names and
company names or logos mentioned in this document are the property of their
respective owners. Reference to any products, services, processes or other
information, by trade name, trademark, manufacturer, supplier or otherwise does
not constitute or imply endorsement, sponsorship or recommendation thereof by
us.
Complying with all applicable copyright laws is the responsibility of the user. Without
limiting the rights under copyright, no part of this document may be reproduced,
stored in or introduced into a retrieval system, or transmitted in any form or by any
means (electronic, mechanical, photocopying, recording, or otherwise), or for any
purpose, without the express written permission of Cloudera.
Cloudera may have patents, patent applications, trademarks, copyrights, or other
intellectual property rights covering subject matter in this document. Except as
expressly provided in any written license agreement from Cloudera, the furnishing
of this document does not give you any license to these patents, trademarks
copyrights, or other intellectual property. For information about patents covering
Cloudera products, see http://tiny.cloudera.com/patents.
The information in this document is subject to change without notice. Cloudera
shall not be liable for any damages resulting from technical errors or omissions
which may be present in this document, or from use of this document.
Cloudera, Inc.
Table of Contents
Cloudera Navigator Installation and User Guide...5
Introducing Cloudera Navigator...7
Cloudera Navigator Requirements...9
Cloudera Manager Requirements...9
Cloudera Navigator Auditing Component ...9
Cloudera Navigator Metadata Component ...10
Installing and Upgrading Cloudera Navigator...13
Cloudera Navigator Auditing Component...15
Cloudera Navigator Audit Server...15
Audit Log Properties...18
Service Auditing Properties...18
Audit Events...20
Viewing Audit Events...21
Filtering Audit Events...21
Downloading Audit Events...22
Downloading HDFS Directory Access Permission Reports...24
Cloudera Navigator Metadata Component...27
Cloudera Navigator Metadata Server...27
Metadata...30
Search Properties...31
Accessing Metadata...34
Modifying Business Metadata...36
Lineage Diagrams...41
Displaying a Template Lineage Diagram...43
Displaying an Instance Lineage Diagram...45
Displaying the Template Lineage Diagram for an Instance Lineage Diagram...45
Downloading a Lineage File...45
Tables...65
Introducing Cloudera Navigator
Cloudera Navigator is a fully integrated data management tool for the Hadoop platform. Data management
capabilities are critical for enterprise customers that are in highly regulated industries and have stringent
compliance requirements.
Cloudera Navigator provides two categories of functionality:
• Auditing data access and verifying access privileges - Cloudera Navigator allows administrators to configure,
collect, and view audit events, to understand who accessed what data and how. Cloudera Navigator also
allows administrators to generate reports that list the HDFS access permissions granted to groups.
Cloudera Navigator tracks access permissions and actual accesses to all entities in HDFS, Hive, HBase, and
Cloudera Impala to help answer questions such as - who has access to which entities, which entities were
accessed by a user, when was an entity accessed and by whom, what entities were accessed using a service,
which device was used to access, and so on. Cloudera Navigator auditing supports tracking access to:
• HDFS data accessed through HDFS, Hive, HBase, Cloudera Impala services
• HBase and Impala operations
• Hive metadata
• Sentry access
• Searching metadata and visualizing lineage - Cloudera Navigator metadata management features allow
DBAs, data modelers, business analysts, and data scientists to search for, amend the properties of, and tag
data entities.
Cloudera Navigator Requirements
Cloudera Manager Requirements
This section describes the Cloudera Navigator requirements for Cloudera Manager.
Cloudera Navigator 2 is available with Cloudera Manager 5.1. For information on the requirements for installing
Cloudera Manager, see
Requirements for Cloudera Manager
.
Cloudera Navigator Auditing Component
This section describes the databases, service versions, and audited operations supported by the Cloudera
Navigator auditing component.
Supported Audit Databases
The Cloudera Navigator auditing component supports the following databases for storing audit events:
• MySQL - 5.0, 5.1, 5.5, and 5.6
• Oracle 11gR2
• PostgreSQL - 8.4, 9.1, and 9.2
Supported Service Versions and Audited Operations
• HDFS - Minimum supported version: CDH 4.0.0.
The captured operations are:
• Operations that access or modify a file's or directory's data or metadata
• Operations denied due to lack of privileges
• HBase - Minimum supported version: CDH 4.0.0.
Note:
• In CDH versions less than 4.2.0, for grant and revoke operations, the operation in log events is
ADMIN
• In simple authentication mode, if the HBase Secure RPC Engine property is
false(the default),
the username in log events is
UNKNOWN. To see a meaningful user name:
1. Click the HBase service.
2. Click the Configuration tab.
3. Select Service-wide > Security.
4. Set the HBase Secure RPC Engine property to
true.
5. Save the change and restart the service.
• Hive - Minimum supported versions: CDH 4.2.0, CDH 4.4.0 for operations denied due to lack of privileges.
The captured operations are:
Note:
• Actions taken against Hive via the Hive CLI are not audited. Therefore if you have enabled
auditing you should disable the Hive CLI to prevent actions against Hive that are not audited.
• In simple authentication mode, the username in log events is the username passed in the
HiveServer2 connect command. If you do not pass a username in the connect command, the
username is log events is
anonymous.
• Hue - Minimum supported version: CDH 4.2.0.
The captured operations are:
• Operations (except grant, revoke, and metadata access only) sent to Beeswax Server
Note: You do not directly configure the Hue service for auditing. Instead, when you configure the
Hive service for auditing, operations sent to the Hive service through Beeswax appear in the Hue
service audit log.
• Cloudera Impala - Minimum supported version: Cloudera Impala 1.2.1.
The captured operations are:
• Queries denied due to lack of privileges
• Queries that pass analysis
• Sentry
The captured operations are:
• Operations sent to the HiveServer2 and the Hive Metastore Server roles
• Add and delete roles, assign roles to groups and remove roles from groups, create and delete privileges,
grant and revoke privileges
• Operations denied due to lack of privileges
Note: You do not directly configure the Sentry service for auditing. Instead, when you configure
the Hive service for auditing, grant, revoke, and metadata operations sent to the Hive Metastore
Server appear in the Hive service audit log.
Cloudera Navigator Metadata Component
This section describes the product versions, components, and browsers supported by the Cloudera Navigator
metadata component.
Supported Product Versions
CDH 4.4.0 and higher for all components except Pig. For Pig, CDH 4.6.0 and higher. The supported components
are:
• HDFS
• Hive
• MapReduce
• Oozie
• Pig
Supported Browsers for the Metadata UI
Installing and Upgrading Cloudera Navigator
Required Role:
Cloudera Navigator is implemented as two roles in the
Cloudera Management Service
: Navigator Audit Server
and Navigator Metadata Server. You can add Cloudera Navigator roles while installing Cloudera Manager for the
first time, into an existing Cloudera Manager installation, or while upgrading an existing Cloudera Manager
installation.
Configuring a Database for the Cloudera Navigator Auditing Component
When you install Cloudera Navigator you choose the database to store audit events. You can choose either an
embedded PostgreSQL database or an external database. For information on supported databases, see
Supported
Audit Databases
on page 9. For information on setting up an external database, see
Installing and Configuring
Databases
.
Adding Cloudera Navigator in a New Cloudera Manager Installation
1. Install Cloudera Manager following the instructions in
Installing Cloudera Manager
.
2. In the first page of the Cloudera Manager installation wizard, choose one of the license options that support
Cloudera Navigator:
• Cloudera Enterprise Data Hub Edition Trial
• Cloudera Enterprise
– Flex Edition
– Data Hub Edition
3. If you have elected Cloudera Enterprise, install a license:
a. Click Upload License.
b. Click the document icon to the left of the Select a License File text field.
c. Navigate to the location of your license file, click the file, and click Open.
d. Click Upload.
Click Continue to proceed with the installation.
4. In the first page of the Add Services procedure, check the Include Cloudera Navigator checkbox.
5. If you have chosen to use an external database, provide the Cloudera Navigator Audit Server database
properties in the Database Setup page.
Adding Cloudera Navigator in an Existing Cloudera Manager Installation
1. Add and start the Cloudera Navigator roles:
•
Adding and Starting the Navigator Audit Server Role
on page 15
•
Adding and Starting the Navigator Metadata Server Role
on page 28
Adding Cloudera Navigator While Upgrading an Existing Cloudera Manager Installation
1. Upgrade Cloudera Manager following the instructions in
Upgrading Cloudera Manager
.
Deleting Cloudera Navigator Roles
1. Do one of the following:
• Select Clusters > Cloudera Management Service > Cloudera Management Service.
• On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management
Service link.
2. Click the Instances tab.
3. Check the checkboxes next to the Navigator Audit Server and Navigator Metadata Server roles.
4. If the role instance is running, select Actions for Selected > Stop and click Stop to confirm the action.
5. Select Actions for Selected > Delete. Click Delete to confirm the deletion.
Upgrading Cloudera Navigator
To upgrade Cloudera Navigator, upgrade Cloudera Manager following the instructions in
Upgrading Cloudera
Manager
.
Important:
Cloudera does not provide an upgrade path from the Navigator Metadata component in Cloudera
Navigator 1.2 to the Cloudera Navigator 2.0 release. If you are upgrading from Cloudera Navigator
1.2, you must perform a clean install of Cloudera Navigator 2.0. Therefore, if you have Cloudera
Navigator roles from a previous beta release:
1. Delete the Navigator roles.
2. Remove the contents of the
Navigator Metadata Server storage directory
.
3. Add the Navigator roles according to the process described in
Adding and Starting the Navigator
Audit Server Role
on page 15 and
Adding and Starting the Navigator Metadata Server Role
on
page 28.
Cloudera Navigator Auditing Component
The Cloudera Navigator auditing component provides data auditing and access features. The architecture of the
Cloudera Navigator auditing component is illustrated below.
The Cloudera Navigator auditing component is implemented as an add-on to Cloudera Manager; all Cloudera
Navigator auditing component functions (configuration and audit log review) are accessed through the Cloudera
Manager Admin Console.
When the Cloudera Navigator auditing component is configured, plug-ins that enable collection of audit events
are added to the HDFS, HBase, and Hive (that is, the HiveServer2 and Beeswax servers) services. The plug-ins
write the audit events to an audit log on the local filesystem. Cloudera Impala records audit events directly in
an audit log file.
The Cloudera Manager Agent monitors the audit log files and sends these events to the Navigator Audit Server.
The Cloudera Manager Agent retries any event that it fails to transmit. As there is no in-memory transient buffer
involved, once the audit events are written to the audit log file, they are guaranteed to be delivered (as long as
filesystem is available). The Cloudera Manager Agent keeps track of current audit event offset in the audit log
that it has successfully transmitted, so on any crash/restart it picks up the audit event from the last successfully
sent position and resumes. Audit logs are rotated and the Cloudera Manager Agent follows the rotation of the
log. The Agent also takes care of purging old audit logs once they have been successfully transmitted to the
Navigator Audit Server. If a plug-in fails to write audit event to audit log file, it can either drop the event or shut
down the process in which they are running (depending on the configured queue policy).
The Navigator Audit Server stores events in the Navigator Audit DB.
Cloudera Navigator Audit Server
Describes how to add and configure the Navigator Audit Server role.
Required Role:
Adding and Starting the Navigator Audit Server Role
1. Do one of the following:
• Select Clusters > Cloudera Management Service > Cloudera Management Service.
• On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management
Service link.
2. Click the Instances tab.
3. Click the Add Role Instances button. The Customize Role Assignments page displays.
4. Assign the Navigator role to a host.
a. Customize the assignment of role instances to hosts. The wizard evaluates the hardware configurations
of the hosts to determine the best hosts for each role. The wizard assigns all worker roles to the same
set of hosts to which the HDFS DataNode role is assigned. These assignments are typically acceptable,
but you can reassign role instances to hosts of your choosing, if desired.
Click a field below a role to display a dialog containing a pageable list of hosts. If you click a field containing
multiple hosts, you can also select All Hosts to assign the role to all hosts or Custom to display the
pageable hosts dialog.
The following shortcuts for specifying hostname patterns are supported:
• Range of hostnames (without the domain portion)
Matching Hosts
Range Definition
10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4
10.1.1.[1-4]
host1.company.com, host2.company.com, host3.company.com
host[1-3].company.com
host07.company.com, host08.company.com, host09.company.com,
host10.company.com
host[07-10].company.com
• IP addresses
• Rack name
Click the View By Host button for an overview of the role assignment by hostname ranges.
5. When you are satisfied with the assignments, click Continue. The Database Setup page displays.
6. Configure database settings:
a. Choose the database type:
• Leave the default setting of Use Embedded Database to have Cloudera Manager create and configure
required databases. Make a note of the auto-generated passwords.
• Select Use Custom Databases to specify external databases.
1. Enter the database host, database type, database name, username, and password for the database
that you created when you set up the database.
b. Click Test Connection to confirm that Cloudera Manager can communicate with the database using the
information you have supplied. If the test succeeds in all cases, click Continue; otherwise check and correct
the information you have provided for the database and then try the test again. (For some servers, if you
are using the embedded database, you will see a message saying the database will be created at a later
step in the installation process.) The Review Changes page displays.
7. Click Continue. The Review Changes page displays with no configuration changes.
8. Click Finish. The Instances page displays.
Configuring the Navigator Audit Server Log Directory
1. Do one of the following:
• Select Clusters > Cloudera Management Service > Cloudera Management Service.
• On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management
Service link.
2. Click the Configuration tab.
3. Click the Navigator Audit Server Default Group category.
4. Set the Navigator Audit Server Log Directory property.
5. Click Save Changes.
6. Click the Instances tab.
7. Check the checkbox next to the Navigator Audit Server role.
8. Select Actions for Selected > Restart.
Configuring the Navigator Audit Server Data Expiration Period
You can configure the number of hours of audit events to keep in the Navigator Audit Server database as follows:
1.
Required Roles
:
2. Set the Navigator Audit Server Data Expiration Period property.
3. Click Save Changes.
4. Click the Instances tab.
5. Check the checkbox next to the Navigator Audit Server role.
6. Select Actions for Selected > Restart.
Configuring the Audit Server to Mask Personally Identifiable Information
This feature is available in Cloudera Navigator 2.0.1 or later.
Personally identifiable information (PII) is information that can be used on its own or with other information to
identify or locate a single person, or to identify an individual in context. The PII masking feature allows you to
specify credit card number patterns (from major credit issuers) that are masked in audit events, in the properties
of entities displayed in lineage diagrams, and in information retrieved from the Audit Server database and the
Metadata Server persistent storage.
Note:
• Masking Social Security numbers is not supported in this release.
• Masking is not applied to audit events and lineage entities that existed before the mask was
enabled.
1. Expand the Navigator Audit Server Default Group category.
2. Click the Advanced category.
3. Configure the PII Masking Regular Expression property with a regular expression that matches the credit
card number formats to be masked. The default expression is:
(4[0-9]{12}(?:[0-9]{3})?)|(5[1-5][0-9]{14})|(3[47][0-9]{13})
|(3(?:0[0-5]|[68][0-9])[0-9]{11})|(6(?:011|5[0-9]{2})[0-9]{12})|((?:2131|1800|35\\d{3})\\d{11})
which is constructed from the following subexpressions:
• Visa -
(4[0-9]{12}(?:[0-9]{3})?)• MasterCard -
(5[1-5][0-9]{14})• American Express -
(3[47][0-9]{13})• Discover -
(6(?:011|5[0-9]{2})[0-9]{12})• JCB -
((?:2131|1800|35\\d{3})\\d{11})If the property is left blank, PII information is not masked.
4. Click Save Changes.
Audit Log Properties
Describes auditing log properties and how to configure the log properties.
The following properties apply to the audit log file:
• Audit Log Directory - The directory in which audit event log files are written. By default, this property is not
set if Cloudera Navigator is not installed.
Note: If the value of this property is changed, and service is restarted, then the Cloudera Manager
Agent will start monitoring the new log directory for audit events. In this case it is possible that
not all events are published from the old audit log directory. To avoid loss of audit events, when
this property is changed, perform the following steps:
1. Stop the service.
2. Copy audit log files and (for Impala only) the
impalad_audit_walfile from the old audit log
directory to the new audit log directory. This need to be done on all the nodes where Impala
daemons are running.
3. Start the service.
• Maximum Audit Log File Size - The maximum size of the audit event log file before a new file is created. The
unit of the file size is service dependent:
– HDFS, HBase, Hive - MiB
– Impala - lines (queries)
• Number of Audit Logs to Retain - Maximum number of rolled over audit logs to retain. The logs will not be
deleted if they contain audit events that have not yet been propagated to Audit Server.
Configuring Audit Logs
1. Click a supported service.
2. Click the Configuration tab.
3. Configure the log properties in the following categories:
• Impala - Impala Daemon Default Group > Logs
• HBase, HDFS, and Hive - Service-Wide > Logs
4. Edit the audit log properties.
5. Click Save Changes.
6. Restart the service.
Service Auditing Properties
Describes service auditing properties and how to configure the properties.
Each service that supports auditing configuration has the following properties:
the Audit Log Directory property is not set, the validator displays a message that says that the Audit Log
Directory property must be set to enable auditing.
• Event Filter - A set of rules that capture properties of auditable events and actions to be performed when
an event matches those properties.
• Event Tracker - A set of rules for tracking and coalescing events. This feature is used to define equivalency
between different audit events. When events match, according to a set of configurable parameters, only one
entry in the audit list is generated for all the matching events.
• Queue Policy - The action to take when the audit event queue is full. The options are Drop or Shutdown.
When a queue is full and the queue policy of the service is Shutdown, before shutting down the service, N
audits will be discarded, where N is the size of the Cloudera Navigator Audit Server queue.
Note: If the queue policy is Shutdown, the Impala service is shut down only if Impala is unable to
write to the audit log file. It is possible that an event may not appear in the audit event log due to
an error in transfer to the Cloudera Manager Agent or database. In such cases Impala will not shut
down and will keep writing to the log file. When the transfer problem is fixed the events will be
transferred to the database.
The Event Filter and Event Tracker rules for filtering and coalescing events are expressed as JSON objects. For
information on the structure of the objects, see the description on the configuration page within the Cloudera
Manager Admin Console.
The default event filter discards events generated by the internal Cloudera and Hadoop users (
cloudera-scm,
hdfs,
hbase,
hive,
mapred,
solr, and
dr.who) and that affect files in the
/tmpdirectory.
Configuring Service Auditing Properties
1. Click a service that supports auditing.
2. Click the Configuration tab.
3. Click the Cloudera Navigator category. The Service-Wide category displays.
4. Edit the properties.
5. Click Save Changes.
6. Restart the service.
Configuring Impala Event Logging
To control whether the Impala daemon logs events to the audit log:
1. Click the Impala service.
2. Click the Configuration tab.
3. Expand the Impala Daemon Default Group > Logs category.
4. Edit the Enable Impala Audit Event Generation checkbox setting.
5. Click Save Changes.
6. Restart the service.
Audit Logging to Syslog
The Audit Server logs all audit records into a
Log4j
logger called
auditStream. The log messages are logged at
the TRACE level, with the attributes of the audit records. By default, the
auditStreamlogger is inactive because
the logger level is set to FATAL. It is also connected to a
NullAppender
, and does not forward to other appenders
(additivity set to false).
To record the audit stream, configure the
auditStreamlogger with the desired appender. For example, the
The Log4j SyslogAppender supports only UDP. An example syslog configuration would be:
$ModLoad imudp $UDPServerRun 514
# Accept everything (even DEBUG messages) local2.* /my/audit/trail.log
It is also possible to attach
other appenders
to the
auditStreamto provide other integration behaviors.
You can audit events to syslog in two formats: JSON and RSA EnVision. To configure audit logging to syslog, do
the following:
1. Do one of the following:
• Select Clusters > Cloudera Management Service > Cloudera Management Service.
• On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management
Service link.
2. Click the Configuration tab.
3. Search for Navigator Audit Server Logging Advanced Configuration Snippet.
4. Click the Value field and depending on the format type, enter:
log4j.logger.auditStream = TRACE,SYSLOG
log4j.appender.SYSLOG = org.apache.log4j.net.SyslogAppender log4j.appender.SYSLOG.SyslogHost = hostname
log4j.appender.SYSLOG.Facility = Local2 log4j.appender.SYSLOG.FacilityPrinting = true
To configure the specific stream type, enter:
Properties
Format
log4j.additivity.auditStream = falseJSON
log4j.additivity.auditStreamEnVision = falseRSA EnVision
5. Click Save Changes to commit the changes.
Example Log Messages
Log Message Example
Format
Jul 23 11:05:15 hostname local2:
{"type":"HDFS","allowed":"true","time":"1374602714758",
JSON
"service":"HDFS-1", "user":"root","ip":"10.20.93.93","op":"mkdirs","src":"/audit/root","perms":"rwxr-xr-x"} Cloudera|Navigator|1|type="Hive",allowed="false",time="1382551146763", service="HIVE-1",user="systest",impersonator="",ip="/10.20.190.185",op="QUERY",RSA EnVision
opText="select count(*) from
sample_07",db="default",table="sample_07",path="/user/hive/warehouse/sample_07",objType="TABLE"
If a particular field is not applicable for that audit event, it is omitted from the message.
Audit Events
An audit event is an event that describes an action that has been taken for a service, role, or host.
In Cloudera Manager, audit event logs display:
• Service, role, and host life cycle events recorded by Cloudera Management Service roles. For further information,
see
Audit Events
in the Cloudera Manager Monitoring and Diagnostics Guide
Audit Event Properties
The following properties can appear in an audit event entry:
• Date - Date and time the action was performed.
• Command - The action performed.
• Source - The object affected by the service action.
• User - The name of the user that performed the action.
• Impersonator - If the action was requested by another service, the name of the user that invoked the service
action on behalf of the user.
– The Impersonator field will always show when Sentry is not enabled.
– The Impersonator field will show for other services than Hive when Sentry is enabled.
• IP Address - The IP address of the host where the service action occurred.
• Service - The name of the service that performed the service action.
• Role - The name of the role that performed the service action.
Viewing Audit Events
You can view audit events for all services or for a specific service. To view audit events, follow the appropriate
procedure:
Procedure
Object
All Services
1. Click Audits in the Cloudera Manager Admin Console top navigation bar.
Service
1. In the Cloudera Manager Admin Console, click a service that supports auditing.
2. Click the Audits tab on the service navigation bar.
Audit event entries are ordered with the most recent at the top.
Events that represent denied access are labeled Denied, red text, and a pink background.
Filtering Audit Events
You filter on generated audit events in the audit UI by selecting a time range and adding filters.
You can use the Time Range Selector or a duration link (
) to set the time range. (See Time
Line in
Cloudera Manager Monitoring and Diagnostics Guide
for details). When you select the time range, the
log displays all events in that range. Note that the time it takes to perform a search will typically increase for a
longer time range, as the number of events to be searched will be larger.
Adding a Filter
• Click the icon that displays next to a property when you hover in one of the event entries. A filter containing
the property, operator, and its value is added to the list of filters at the left and Cloudera Manager redisplays
all events that match the filter.
Choose a property in the drop-down list. You can search by properties such as Username, Service, Command,
or Role. The properties vary depending on the service or role.
1.
2. If the property allows it, choose an operator in the operator drop-down list.
3. Type a property value in the value text field. For some properties, where the list of values is finite and
known, you can start typing and then select from a list of potential matches. To match a substring, use
the
likeoperator and specify
%around the string. For example, to see all the audit events for files created
in the folder
/user/joe/outspecify
Source like %/user/joe/out%.
4. Click Search. The log displays all events that match the filter criteria.
5. Click to add more filters and repeat steps
1
through
4
.
Removing a Filter
1. Click the at the right of the filter. The filter is removed.
2. Click Search. The log displays all events that match the filter criteria.
Downloading Audit Events
You can download audit events in the Audit UI or using the Audit API. An audit event contains the following
fields:
service,
username,
command,
ipAddress,
resource,
allowed,
timestamp,
operationText. The structure
of the
resourcefield depends on the type of the service:
• HDFS - A file path.
• Hive, Hue, Impala, and Sentry - database:tablename
• HBase - table family:qualifier
For Hive, Hue, Impala, and Sentry events,
operationTextcontains the operation string.
Downloading Audit Events Using the Audit UI
1. Display the audit log.
2. Specify desired filters and time range.
3. Click the Download CSV button. A file named
history.csvis downloaded.
HDFS Audit Log Example
service,username,command,ipAddress,resource,allowed,timestamp,operationText,
HDFS,cloudera,setPermission,20.10.187.242,/user/hive,false,"2014-02-09T00:59:34.430Z", HDFS,cloudera,getfileinfo,20.10.187.242,/user/cloudera,true,"2014-02-09T00:59:22.667Z", HDFS,cloudera,getfileinfo,20.10.187.242,/,true,"2014-02-09T00:59:22.658Z",
In this example, the first event access was denied, and therefore the
allowedfield has the value
false.
Hive and Sentry Example - via downloaded CSV file
The following records list Hive operations to create and load a table and Sentry operations to created roles and
grant privileges:
service,username,command,ipAddress,resource,allowed,timestamp,operationText
Hive,admin,LOAD,20.10.191.128,default:sample_08,true,"2014-07-08T20:21:03.510Z","LOAD DATA INPATH
'/user/admin/sample_08' OVERWRITE INTO TABLE sample_08"
Hive,admin,LOAD,20.10.191.128,:,true,"2014-07-08T20:21:03.509Z","LOAD DATA INPATH '/user/admin/sample_08' OVERWRITE INTO TABLE sample_08"
STORED AS TextFile"
Hive,hive,GRANT_ROLE,20.10.191.128,:,true,"2014-07-08T20:19:26.718Z","GRANT ROLE default_admin TO GROUP sentryDefaultAdmin"
Hive,hive,GRANT_PRIVILEGE,20.10.191.128,default:,true,"2014-07-08T20:19:26.149Z","GRANT ALL ON DATABASE default TO ROLE default_admin"
Hive,hive,CREATEROLE,20.10.191.128,:,true,"2014-07-08T20:19:25.761Z","CREATE ROLE default_admin"
Hive,hive,GRANT_ROLE,20.10.191.128,:,true,"2014-07-08T20:19:20.515Z","GRANT ROLE global_admin TO GROUP sentryAdmin"
Hive,hive,GRANT_ROLE,20.10.191.128,:,true,"2014-07-08T20:19:20.063Z","GRANT ROLE global_admin TO GROUP hive"
Hive,hive,GRANT_PRIVILEGE,20.10.191.128,server1:,true,"2014-07-08T20:19:19.382Z","GRANT ALL ON SERVER server1 TO ROLE global_admin"
Hive,hive,CREATEROLE,20.10.191.128,:,true,"2014-07-08T20:19:18.281Z","CREATE ROLE global_admin"
Downloading Audit Events Using the Audit API
You can filter and download audit events using the Cloudera Manager audit API. See
Audit API
.
Hive and Sentry Example - via audit API
To download the same audits events using the API, issue the request
http://host-1.ent.cloudera.com:7180/api/v6/audits?query=service==*HIVE*
, which returns the
following JSON items:
{ "items" : [ { "timestamp" : "2014-07-08T20:21:03.510Z", "service" : "Hive", "username" : "admin", "ipAddress" : "20.10.191.128", "command" : "LOAD", "resource" : "default:sample_08",
"operationText" : "LOAD DATA INPATH\n '/user/admin/sample_08' OVERWRITE INTO TABLE sample_08", "allowed" : true }, { "timestamp" : "2014-07-08T20:21:03.509Z", "service" : "Hive", "username" : "admin", "ipAddress" : "20.10.191.128", "command" : "LOAD", "resource" : ":",
"operationText" : "LOAD DATA INPATH\n '/user/admin/sample_08' OVERWRITE INTO TABLE sample_08", "allowed" : true }, { "timestamp" : "2014-07-08T20:21:01.899Z", "service" : "Hive", "username" : "admin", "ipAddress" : "20.10.191.128", "command" : "CREATETABLE", "resource" : "default:sample_08",
"operationText" : "CREATE TABLE `sample_08` (\n `code` string ,\n `description` string ,\n `total_emp` int ,\n `salary` int )\nROW FORMAT DELIMITED\n FIELDS TERMINATED BY '\t'\nSTORED AS TextFile",
"allowed" : true }, { "timestamp" : "2014-07-08T20:19:26.718Z", "service" : "Hive", "username" : "hive", "ipAddress" : "20.10.191.128", "command" : "GRANT_ROLE", "resource" : ":",
"operationText" : "GRANT ROLE default_admin TO GROUP sentryDefaultAdmin", "allowed" : true
}, {
"timestamp" : "2014-07-08T20:19:26.149Z", "service" : "Hive",
"ipAddress" : "20.10.191.128", "command" : "GRANT_PRIVILEGE", "resource" : "default:",
"operationText" : "GRANT ALL ON DATABASE default TO ROLE default_admin", "allowed" : true }, { "timestamp" : "2014-07-08T20:19:25.761Z", "service" : "Hive", "username" : "hive", "ipAddress" : "20.10.191.128", "command" : "CREATEROLE", "resource" : ":",
"operationText" : "CREATE ROLE default_admin", "allowed" : true }, { "timestamp" : "2014-07-08T20:19:20.515Z", "service" : "Hive", "username" : "hive", "ipAddress" : "20.10.191.128", "command" : "GRANT_ROLE", "resource" : ":",
"operationText" : "GRANT ROLE global_admin TO GROUP sentryAdmin", "allowed" : true }, { "timestamp" : "2014-07-08T20:19:20.063Z", "service" : "Hive", "username" : "hive", "ipAddress" : "20.10.191.128", "command" : "GRANT_ROLE", "resource" : ":",
"operationText" : "GRANT ROLE global_admin TO GROUP hive", "allowed" : true }, { "timestamp" : "2014-07-08T20:19:19.382Z", "service" : "Hive", "username" : "hive", "ipAddress" : "20.10.191.128", "command" : "GRANT_PRIVILEGE", "resource" : "server1:",
"operationText" : "GRANT ALL ON SERVER server1 TO ROLE global_admin", "allowed" : true }, { "timestamp" : "2014-07-08T20:19:18.281Z", "service" : "Hive", "username" : "hive", "ipAddress" : "20.10.191.128", "command" : "CREATEROLE", "resource" : ":",
"operationText" : "CREATE ROLE global_admin", "allowed" : true
} ] }
Downloading HDFS Directory Access Permission Reports
For each HDFS service you can download a report that details the HDFS directories a group has permission to
access.
1. In the Cloudera Manager Admin Console, navigate to the Reports page:
• (Cloudera Manager 4) Click Reports.
• (Cloudera Manager 5) Click Clusters > ClusterName > Other > Reports.
Cloudera Navigator Metadata Component
The Cloudera Navigator metadata component provides data discovery and data lineage management functions.
The architecture of the Cloudera Navigator metadata component is illustrated below.
The Navigator Metadata Server performs five main functions:
• Obtains connection information about the services whose data it manages from the Cloudera Manager Server
• Extracts entity metadata from the services at periodic intervals
• Indexes the metadata
• Implements a graphical UI for searching the metadata and displaying lineage relationships between entities
• Implements a REST API for retrieving metadata
Cloudera Navigator Metadata Server
Describes how the Cloudera Navigator Metadata Server extracts metadata from the entities managed by Cloudera
Manager, and how to add and configure the Navigator Metadata Server role.
Required Role:
About Metadata Extraction
The Navigator Metadata Server extracts metadata from the following resource types from the listed servers:
• HDFS - Extracts HDFS metadata at the next scheduled extraction run after an HDFS checkpoint. However,
if you have High Availability enabled, metadata is extracted as soon as it is written to the JournalNodes.
• Hive - Extracts database and table metadata from the Hive Metastore Server.
• MapReduce - Extracts job metadata from the JobTracker. The default setting in Cloudera Manager retains a
maximum of five jobs, which means if you run more than five jobs between Navigator extractions, the
Navigator Metadata Server would extract the five most recent jobs.
• Pig - Extracts Pig script runs from the JobTracker or Job History Server.
• Sqoop 1 - Extracts database and table metadata from the Hive Metastore Server.
• YARN - Extracts job metadata from the Job History Server.
If an entity is created at time t0 in the system, that entity will be extracted and linked in Navigator after the
extraction poll period (default 10 minutes) plus a service-specific interval as follows:
• HDFS: t0 + extraction poll period + HDFS checkpoint interval (default 1 hour)
• HDFS + HA: t0 + extraction poll period
• Hive: t0 + extraction poll period + Hive maximum wait time (default 60 minutes)
Adding and Starting the Navigator Metadata Server Role
1. Do one of the following:
• Select Clusters > Cloudera Management Service > Cloudera Management Service.
• On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management
Service link.
2. Click the Instances tab.
3. Click the Add Role Instances button. The Customize Role Assignments page displays.
4. Assign the Navigator role to a host.
a. Customize the assignment of role instances to hosts. The wizard evaluates the hardware configurations
of the hosts to determine the best hosts for each role. The wizard assigns all worker roles to the same
set of hosts to which the HDFS DataNode role is assigned. These assignments are typically acceptable,
but you can reassign role instances to hosts of your choosing, if desired.
Click a field below a role to display a dialog containing a pageable list of hosts. If you click a field containing
multiple hosts, you can also select All Hosts to assign the role to all hosts or Custom to display the
pageable hosts dialog.
The following shortcuts for specifying hostname patterns are supported:
• Range of hostnames (without the domain portion)
Matching Hosts
Range Definition
10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4
10.1.1.[1-4]
host1.company.com, host2.company.com, host3.company.com
host[1-3].company.com
host07.company.com, host08.company.com, host09.company.com,
host10.company.com
host[07-10].company.com
• IP addresses
• Rack name
Click the View By Host button for an overview of the role assignment by hostname ranges.
5. Click Finish. The Instances page displays.
6. Check the checkbox next to the Navigator Metadata Server role.
7. Select Actions for Selected > Start. Click Start to confirm the action.
Configuring the Navigator Metadata Server Storage Directory
Describes how to configure where the Navigator Metadata Server stores extracted data. The default is
/var/lib/cloudera-scm-navigator
.
1. Do one of the following:
• On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management
Service link.
2. Click the Configuration tab.
3. Click the Navigator Metadata Server Default Group.
4. Specify the directory in the Navigator Metadata Server Storage Dir property.
5. Click Save Changes.
6. Click the Instances tab.
7. Check the checkbox next to the Navigator Metadata Server role.
8. Select Actions for Selected > Restart.
Configuring the Navigator Metadata Server Port
Describes how to configure the port on which the Navigator Metadata UI is accessed. The default port is 7187.
1. Do one of the following:
• Select Clusters > Cloudera Management Service > Cloudera Management Service.
• On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management
Service link.
2. Click the Configuration tab.
3. Select Navigator Metadata Server Default Group > Ports and Addresses.
4. Specify the port in the Navigator Metadata Server Port property.
5. Click Save Changes.
6. Click the Instances tab.
7. Check the checkbox next to the Navigator Metadata Server role.
8. Select Actions for Selected > Restart.
Configuring the Metadata Server to Mask Personally Identifiable Information
This feature is available in Cloudera Navigator 2.0.1 or later.
Navigator Metadata Server Sizing and Performance Recommendations
Two activities determine Navigator Metadata Server resource requirements:
• Extracting metadata from the cluster and creating relationships
• Querying
The Navigator Metadata Server uses
Solr
to store, index, and query metadata. Indexing happen during extraction.
Querying is fast and efficient because the data is indexed.
Memory and CPU requirements are based on amount of data that is stored and indexed. With 6 GB of RAM and
8-10 cores Solr can process 6 million entities in 25-30 minutes or 80 million entities in 8 to 9 hours. Any less
RAM than 6GB and will result in excessive garbage collection and possibly out-of-memory exceptions. For large
clusters, Cloudera advises at least 8 GB of RAM and 8 cores. The Solr instance runs in process with Navigator,
so the Java heap for the Navigator Metadata Server should be set according to the size of cluster.
By default, during the Cloudera Manager first run installation wizard the Navigator Audit Server and Navigator
Metadata Server are assigned to the same host as the Cloudera Management Service monitoring roles. This
configuration works for a small cluster, but should be updated before the cluster grows. You can either change
the configuration at installation time or move the Navigator Metadata Server if necessary.
Moving a Navigator Metadata Server Role
Enabling Hive Metadata Extraction in a Secure Cluster
The Navigator Metadata Server uses the hue user to connect to the Hive Metastore. The hue user is able to
connect to the Hive Metastore by default. However, if the Hive service Hive Metastore Access Control and Proxy
User Groups Override property and/or the HDFS service Hive Proxy User Groups property have been changed
from their default values to settings that prevent the hue user from connecting to the Hive Metastore, Navigator
Metadata Server will be unable to extract metadata from Hive. If this is the case, modify the Hive service Hive
Metastore Access Control and Proxy User Groups Override property and/or the HDFS service Hive Proxy User
Groups property so that the hue user can connect as follows:
1. Go to the Hive or HDFS service.
2. Click the Configuration tab.
3. Expand the Service-Wide > Proxy category.
4. In the Hive service Hive Metastore Access Control and Proxy User Groups Override field or the HDFS service
Hive Proxy User Groups field, click the Value column, and click to add a new row.
5. Type
hue.
6. Click Save Changes to commit the changes.
7. Restart the service.
Metadata
The Cloudera Navigator Metadata component manages metadata about the entities in a Hadoop cluster and
relationships between the entities.
The Cloudera Navigator metadata schema defines the types of metadata that are available for each entity type
it supports. The types of metadata defined by the Cloudera Navigator Metadata component include: the name
of an entity, the service that manages or uses the entity, type, path to the entity, date and time of creation,
access, and modification, size, owner, purpose, and relationships—parent-child, data flow, and instance
of—between entities.
For example, the following shows the property sheet of a file entity:
There are two classes of metadata:
• technical metadata - metadata defined when entities are extracted. You cannot modify technical metadata.
• business metadata - metadata
added
to extracted entities. You can add and modify business metadata
Metadata Indexing
After metadata is
extracted
it is indexed and made available for searching by an embedded
Solr
engine. The Solr
schema indexes two types of metadata: entity properties and relationships between entities.
You can search entity metadata using the Navigator Metadata UI. Relationship metadata is implicitly visible in
lineage diagrams
and explicitly available in a
lineage file
.
Search Syntax
Search in the Cloudera Navigator Metadata component is implemented by an embedded Solr engine that supports
the syntax described in
LuceneQParserPlugin. You construct search strings by specifying the value of a
default
property
, property name-value pairs, or user-defined name-value pairs using the syntax:
• Property name-value pairs - propertyName
:value, where
–
propertyName is one of the properties listed in
Search Properties
on page 31.
–
value is a single value or range of values specified as
[value1 TO value2]. In a value,
*is a wildcard.
In property name-value pairs you must escape special characters
:,
/, and
*with the backslash character
\. For example,
fileSystemPath:\/user\/admin.
• User-defined name-value pairs -
up_propertyName:value.
To construct complex strings, join multiple property-value pairs using the
orand
andoperators.
Example Search Strings
• Filesystem path
/user/admin-
fileSystemPath:\/user\/admin• Descriptions that start with the string "Banking" -
description:Banking*• Sources of type MapReduce or Hive -
sourceType:MAPREDUCE or sourceType:HIVE• Directories owned by
hdfsin the path
/user/hdfs/input-
owner:HDFS and type:directory and fileSystemPath:\/user\/hdfs\/input• Job started between 20:00 to 21:00 UTC -
started:[2013-10-21T20:00:00.000Z TO 2013-10-21T21:00:00.000Z]• User-defined key-value
project-customer1-
up_project:customer1Note: When viewing MapReduce jobs in the Cloudera Manager Activities page, the string that appear
in a job's Name column equates to the
originalNameproperty. Therefore, to specify a MapReduce
job's name in a search, use the following string:
(resType:mapreduce) and(originalName:jobName)
, where jobName is the value in the job's Name column.
Search Properties
A reference for the search schema properties.
Default Properties
The following properties can be searched simply by specifying a property value:
type,
fileSystemPath,
inputs,
jobId,
mapper,
mimeType,
name,
originalName,
outputs,
owner,
principal,
reducer, and
tags.
Common Properties
Description
Type
Name
Description of the entity.
text
description
The group to which the owner of the entity belongs.
caseInsensitiveText
group
The overridden name of the entity. If the name has not been overridden, this
value is empty. Names cannot contain spaces.
ngramedText
Description
Type
Name
The type of an operation:
ngramedText
operationType
• Pig - SCRIPT
• Sqoop - Table Export, Query Import
The name of the entity when it was extracted.
ngramedText
originalName
The description of the entity when it was extracted.
text
originalDescription
The owner of the entity.
caseInsensitiveText
owner
For entities with type
OPERATION_EXECUTION, the initiator of the entity.
caseInsensitiveText
principal
A set of tags that describe the entity.
ngramedText
tags
The type of the entity. The available types depend on the entity's source type:
ngramedText
type
• HDFS -
DIRECTORY,
FILE• HIVE -
DATABASE,
TABLE,
FIELD,
OPERATION,
OPERATION_EXECUTION,
SUB_OPERATION,
PARTITION,
RESOURCE,
UNKNOWN,
VIEW• MAPREDUCE -
OPERATION,
OPERATION_EXECUTION• OOZIE -
OPERATION,
OPERATION_EXECUTION• PIG -
OPERATION,
OPERATION_EXECUTION• SQOOP -
OPERATION,
OPERATION_EXECUTION,
SUB_OPERATION• YARN -
OPERATION,
OPERATION_EXECUTIONQuery
The text of a Hive or Sqoop query.
string
queryText
Source
The name of the cluster in which the entity is stored.
string
clusterName
The ID of the source type.
string
sourceId
The source type of the entity: HDFS, HIVE, MAPREDUCE, OOZIE, PIG, SQOOP,
YARN.
caseInsensitiveText
sourceType
The URL of the source type.
string
sourceUrl
Timestamps
Description
Type
Name
-
started,
endedHDFS Properties
Description
Type
Name
The path to the entity.
path
fileSystemPath
Indicates whether the entity is compressed.
Boolean
compressed
Indicates whether the entity has been moved to the Trash folder.
Boolean
deleted
The time the entity was moved to the Trash folder.
date
deleteTime
The MIME type of the entity.
ngramedText
mimeType
The path to the parent entity for a child entity. For example:
parentpath:/default/sample_07
for the table
sample_07from the Hive database
default.
string
parentPath
The UNIX access permissions of the entity.
string
permissions
The exact size of the entity in bytes or a range of sizes. Range examples:
size:[1000 TO *]
,
size: [* TO 2000], and
size:[* TO *]to find all
fields with a size value.
long
size
MAPREDUCE and YARN Properties
Description
Type
Name
Indicates whether files are searched recursively under the input directories, or
just files directly under the input directories are considered.
Boolean
inputRecursive
The ID of the job. For a job spawned by Oozie, the workflow ID.
ngramedText
jobId
The fully-qualified name of the mapper class.
string
mapper
The fully-qualified name of the class of the output key.
string
outputKey
The fully-qualified name of the class of the output value.
string
outputValue
The fully-qualified name of the reducer class.
string
reducerOPERATION Properties
Description
Type
Name
Operation
The fully-qualified name of the class of the input format.
string
inputFormat
The fully-qualified name of the class of the output format.
string
outputFormat
Operation Execution
The name of the entity input to an operation execution. For entities of resource
type MR, it is usually a directory. For entities of resource type Hive, it is usually
a table.
string
Description
Type
Name
The name of the entity output from an operation execution. For entities of
resource type MR, it is usually a directory. For entities of resource type Hive, it
is usually a table.
string
outputsHIVE Properties
Description
Type
Name
Field
The type of data stored in a field (column).
ngramedText
dataType
Table
Indicates whether a Hive table is compressed.
Boolean
compressed
The name of the library containing the SerDe class.
string
serDeLibName
The fully-qualified name of the SerDe class.
string
serDeName
Partition
The table columns that define the partition.
string
partitionColNames
The table column values that define the partition.
string
partitionColValuesOozie Properties
Description
Type
Name
The status of the Oozie workflow: RUNNING, SUCCEEDED, or FAILED.
string
statusPIG Properties
Description
Type
Name
The ID of the Pig script.
string
scriptIdSQOOP Properties
Description
Type
Name
The URL of the database from or to which the data was imported or exported.
string
dbURL
The table from or to which the data was imported or exported.
string
dbTable
The database user.
string
dbUser
The where clause that identifies which rows were imported.
string
dbWhere
An expression that identifies which columns were imported.
string
dbColumnExpression
Accessing Metadata
Navigator Metadata UI
Opening the Navigator Metadata UI
The Navigator Metadata UI allows you to search entity metadata and view entity linage.
To open the Metadata UI, do one of the following:
• Directly open the URL of the Cloudera Navigator Metadata Server:
http://hostname:port/, where hostname
is the name of the host on which you are running the
Cloudera Navigator Metadata Server
on page 27 and
port is the port configured for the Cloudera Navigator Metadata Server.
Note: The default port of the Cloudera Navigator Metadata Server is
7187. To change the port,
follow the instructions in
Configuring the Navigator Metadata Server Port
on page 29.
• Navigate to the Navigator Metadata Web UI:
1. Do one of the following:
• Select Clusters > Cloudera Management Service > Cloudera Management Service.
• On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera
Management Service link.
2. Click the Instances tab.
3. Click the Navigator Metadata Server role.
4. Click the Navigator Web UI link.
Searching Metadata
You perform search in the Navigator Metadata UI by typing search strings or constructing search strings using
UI controls.
1. Open the Navigator Metadata UI.
2. Provide Cloudera Manager default administrator credentials and click Login.
3. Do one of the following:
• Type a search string into the Search box that conforms to the
search syntax
. The Search Results page
displays as soon as you start typing.
• Click the Query Builder link. The Query Builder landing page displays with the result of the wildcard search
(*). The Query Builder landing page displays Source Type and Type facets that match the search results
with the number of results that match each value of those properties. You can filter the search results
by clicking specific values for those properties or adding new properties.
The Full Query read-only box displays the search string constructed from the specified filters. Click Show
n Results to display the Search Results page.
Search Results
The Search Results page has a Search box and two panes: the Query Builder pane and The Search Results pane.
The Search Results pane displays the number of matching entries
in pages listing 25
entities per page. You can view the pages using the page control
at the bottom of each
in the result list contains:
• Source type
• Entity name - the name is a link to a page that displays the entity
property editor
and
lineage diagram
.
• Entity properties
Specifying Property Values in the Query Builder Pane
The Query Builder pane contains a Search box and a set of graphical controls that allow you to select property
values to filter search results. You can filter using the Search box or the graphical controls.
In the Search box, type the values of
default properties
.
To filter on a property value for non-default properties, specify values as follows:
• Boolean - Check the checkbox.
• Enumerated - Start typing or click the field and then select from a drop-down list.
• Timestamps - Specified in the format
mm/dd/yyyy hh:mm [AM|PM]in a date control
. You choose dates in your local time zone, but the string that displays in the
Search box is UTC. In the date control:
– Date - Click the down arrow to display a calendar and select a date, or click a subfield and click the
spinner arrows
or up and down arrow keys.
–
Click the hour, minute, and AM or PM fields and click the spinner arrows
or up and down arrow keys
to specify the value.
– Move between fields using the right and left arrow keys.
To add a property, click Add another filter... and select a property name.
Navigator Metadata API
The Navigator Metadata API allows you to search entity metadata using a REST API. For information about the
API, see
• 2.0.1 and higher -
Cloudera Navigator v2 API documentation
• 1.2 - 2.0.0 -
Cloudera Navigator v1 API documentation
.
Modifying Business Metadata
The Cloudera Navigator Metadata component allows you to add and modify the following business metadata
associated with entities: display name, description, tags, and user-defined name-value pairs. You can modify
business metadata using the Navigator Metadata UI, MapReduce service and job properties, Navigator metadata
files, and the Navigator Metadata API.
Modifying Business Metadata Using the Navigator Metadata UI
1. Run a
search
in the Cloudera Navigator Metadata UI.
2. Click an entity link returned in the search. The metadata pane displays on the left and the lineage page
displays on the right.
4. Edit any of the fields as instructed. Press Enter or Tab to create new tag entries. For example, a Description,
the tags
occupationsand
salaries, and property year with value 2012 have been added to the file
sample_07.csv