• No results found

IBM Software InfoSphere Guardium. Planning a data security and auditing deployment for Hadoop


Academic year: 2021

Share "IBM Software InfoSphere Guardium. Planning a data security and auditing deployment for Hadoop"


Full text


1 2 3 4 5 6


Key requirements for detecting data breaches and ad-dressing compliance.


In-depth look at the architecture throughout the Hadoop stack.


Best practices to consider when building out your data security plan.


Building blocks for implementing effective data monitoring.


Operationalize your processes with extra emphasis given to handling security breaches and foren-sic investigations.



toward enterprise‑readiness for Hadoop

Hadoop is delivering insights for many organizations that are using it. However, the security risks remain high. Although some Hadoop distributions do support various security and authentication solu-tions, there has not been a comprehensive data activity monitoring solution for Hadoop until now. Considering that even robust and mature enterprise relational database sys-tems are often the target of attacks, the relative lack of controls around Hadoop makes it an attractive target, especially as more sensitive and valuable data from a wide variety of sources moves into the Hadoop cluster.

Organizations who tackle this issue head-on, sooner rather than later, position themselves to expand their use of Hadoop for

enhanced business value. They can pro-ceed with the confidence that they can address regulatory requirements and detect breaches quickly, thus reducing overall business risk for the Hadoop project. Ideally, organizations should be able to integrate big data applications and analysis into an existing data security infrastructure, rather than relying on homegrown scripts and monitors, which can be labor-intensive, error-prone and subject to misuse.

With IBM® InfoSphere® Guardium® data

security solutions, much of the heavy-lifting is taken care of for you. You define security policies that specify what data needs to be retained and how to react to policy violations. Data events are written directly to a hardened appliance, leaving no

opportunity for even privileged users to access that data and hide their tracks. Out-of-the-box reports and policies get you up and running quickly, and those reports and policies are easily customized to align with your audit requirements.


Comprehensive monitoring for Hadoop

InfoSphere Guardium helps you make sense of what’s going on by actively monitoring activity throughtout the Cloudera or IBM InfoSphere BigInsights Hadoop stack (see Figure 1), including Hue/Beeswax or BigInsights Web Console, MapReduce, Hive, HBase and HDFS.

Not only does this comprehensive monitor-ing help with data protection, it can also help you find and react to breaches or unauthorized access quicker by making it easier to see what is happening. Even though much of the activity in Hadoop breaks down to MapReduce and HDFS, at that level, you may not be able to tell what a user higher up in the stack was really trying to do, or even who the user was. It is similar to showing disk segment I/O operations

instead of an audit trail of a database. Figure 1. InfoSphere Guardium can capture activity as it flows through the Hadoop stack

User Interface Who submittedthe job/query?

What jobs? What queries?

Is this an authorized job?

Permission exceptions? What files accessed?


By providing monitoring at different levels, you are more likely to understand the activity, as well as being able to audit activities that come in directly through lower points in the stack.

For example, the Hue/Beeswax report included with InfoSphere Guardium will show you the actual Hive queries that were run, as shown in Figure 2. A report in the same time period for HDFS would show you that activity at a file-system level.

Figure 2. See commands, users, exceptions and more.



Type Server IP Hive Parsed SQL HiveUser HiveCommand HiveDatabase Hive TableName Hive Error

SELECT * FROM “DavidTest” SELECT * FROM “DavidTest” DROP TABLE “sample_07” SELECT * FROM “sample_08”

david david david cloudera cloudera cloudera get_table create_table get_table get_table get_table get_table default default default default default default JoeD2222 demo22 NoSuchObjectException(message;default.JoeD2222 table not found)

CREATE EXTERNAL TABLE demo22 (a int, b int, c int) location

27 /user/syonoa

DavidTest DavidTest SAMPLE_07 SAMPLE_08


Architecture of the solution

As shown in Figure 3, InfoSphere Guardium continuously monitors data activity using lightweight software probes called S-TAPs without relying on logs. The S-TAPs also do not require any changes to the Hadoop servers or applications.

Because privileged users can delete or modify logs, InfoSphere Guardium helps ensure separation of duties by immediately intercepting and forwarding data activity to a separate hardened appliance, known as a Collector. There, the activity messages are compared to previously defined policies to detect violations that could, for example, generate an alert in real time. The relevant activity is stored in the Guardium repository from which you can also do forensic analysis and schedule regular audit reports.

The InfoSphere Guardium S-TAP was originally designed for performance with low overhead; after all, the S-TAP is also

used to monitor production database environments.

Figure 3. Architecture enforces separation of duties

Cluster Clients MapReduce jobs HDFS and HBase commands InfoSphere Guardium collector

InfoSphere Guardium reporting and alerting

InfoSphere Guardium S-TAP


Make a plan

Data activity monitoring for Hadoop is newer than Hadoop itself, but with InfoSphere Guardium, a wide variety of enterprise data sources can be monitored using the same scalable environment. If you are already monitoring relational databases, the planning concepts will be similar, even if the specifics are different.

Here are some questions to aid in planning a monitoring and auditing solution for Hadoop:

Who needs to be involved? Where is the monitoring software

installed? Where should the appliances be located?

How should the deployment be rolled out? What are the business requirements

for monitoring?

Who needs to be involved?


Where is software installed? Where should appliances be located?

InfoSphere Guardium consists of software components that sit on the Hadoop cluster servers (the S-TAPs and the optional instal-lation manager agents) and separate hard-ware or softhard-ware appliances. The appli-ances can be fully configured software solutions delivered on physical appliances provided by IBM or software images that you deploy on your own hardware.

InfoSphere Guardium scalable architecture

The InfoSphere Guardium distributed archi-tecture is built to scale — from small to very large — using a graduated system of collec-tors and aggregacollec-tors, as well as the ability to perform load balancing (see Figure 4).

Make a plan


Table 1. Team members for an InfoSphere Guardium deployment

Primary team members

Contributing team members

Business Analyst Collects and documents business requirements for auditing, monitoring and logging. Data Monitoring

Architecture team

Responsible for defining reports, policies and audit processes. To properly observe segregation of duties requirements, members of this team should not have privileges to install policies or modify the contents of groups that are defined for use in Guardium policies and reports, such as authorized users, privileged users or sensitive data.

Project Manager Manages product implementations and upgrades.

Network Engineer Assigns IP addresses to the InfoSphere Guardium appliance, and ensures connectivity through network infrastructure including firewalls. Storage and

Backup engineers

Ensure that retention period policies are in compliance, and proper operational procedures are in place.

Security Escalation Performs/activates forensic analysis if a data security breach is reported. Security Team Produces standards for monitoring; stays up-to-date on industry data security

requirements and government regulations.

Technology Group Evaluates, tests and certifies new software releases and patches; produces technical documentation.

Application Managers Keep InfoSphere Guardium application administrator informed of non-BAU activity and implementation of new modules that may impact data collection.

Hadoop Administrator Keeps InfoSphere Guardium application administrator informed of changes in platform environment, such as upgrades of OS and introductions of new servers. System Administrator Typically installs software on operating systems. They would install


A Collector is used to collect data activity, analyze it in real time, and log it in the internal repository for further analysis and/or reacting in real-time (alerting). Depending on how much audit data you collect (which is determined by your business requirements for auditing), you may need multiple Collectors, which should be co-located in the same data center as the Hadoop cluster.

The Aggregator is used to collect and merge information from multiple appliances (collectors and other aggregators) to

produce a holistic view of the entire

environment and generate enterprise-level reports. The Aggregator does not collect data itself; it just aggregates data from multiple sources. A single Aggregator can support up to ten Collectors. The Aggregator can be located anywhere, but requires

network connectivity to the Collector units. Figure 4. Scalable, distributed architecture

Policies, groups, users pushed down from

Central Manager. Definitions pushed up from

Collectors and Aggregators to Central Manager.

Nightly audit data uploads from Collectors. Central Manager Aggregator Aggregator Collectors Collectors

Make a plan


The Central Manager is used to manage the entire InfoSphere Guardium deployment (all the collectors and aggregators) from a single console, including patch installation, software updates, and the management


InfoSphere Guardium S‑TAPs reside on Hadoop servers

Think of the S-TAP as the listener for data activity; one is installed on each Hadoop server that requires monitoring (see Figure 5). Each S-TAP must be configured with one or more inspection engines. This is how you tell InfoSphere Guardium S-TAP which ports to monitor. For example, if you have the HDFS NameNode and Hive master on the same machine, you would need one S-TAP configured with two inspection engines.

To configure the inspection engines, you will need to work with the network or Hadoop administrator to get a list of the ports, such as the JobTracker ports and HBase master.

IBM InfoSphere Guardium provides a centralized solution for installing and

updating multiple S-TAPs using the InfoSphere Guardium Installation Manager (GIM). GIM sits on a Central Manager and provides a UI interface to make S-TAP management, includ-ing applyinclud-ing software maintenance, simpler and more automated. This would require the installation of an InfoSphere Guardium Installation Manager S-TAP agent on each server, which you can do during any maintenance window, and then use GIM to install the S-TAPs.



Optional S-TAP required only for monitoring HBase Put commands

SecondaryNN HiveServer JobTracker NameNode HBase Master Distributed data processing Map/Reduce Distributed query processing Distributed data Stgorage

HDFS Clients Data Node Task Tracker HBase Region Data Node Task Tracker HBase Region Data Node Task Tracker HBase Region Data Node Task Tracker HBase Region

Figure 5. Hadoop servers with Guardium S-TAPs


How should the deployment be rolled out?

As with any significant IT infrastructure enhancement, it’s a good idea to do a proof-of-concept in a sandbox environment. Not only will this help you validate the

auditing solution, it will give you the oppor-tunity to see for yourself how data activity is stored. It may also help you identify processes and procedures you need to put in place to make sure the production

deployment will go smoothly, and to help support automation procedures. For exam-ple, it is possible to automatically update privileged users groups or sensitive data objects in the InfoSphere Guardium system on a scheduled basis.

For a production deployment, IBM services can help you create a project plan that will

include education, planning, installation and configuration.

What are the business requirements for data monitoring?

Although InfoSphere Guardium provides a comprehensive data monitoring solution, in reality you don’t need to monitor everything. For example, Hadoop has a “chatty” proto-col, so InfoSphere Guardium includes a built-in policy with rules to filter out some of the internal messages the system uses for health checks. Over time, you can add rules to ensure that you are capturing activity that is required for audit.

There are different levels of auditing to consider:

• Privileged user audit applies only to

specific users or groups of users,

and everything else is filtered out before even being sent to the InfoSphere Guardium appliance.

• Selective auditing means that only a

subset of data activity is logged. However, in this case, everything is sent to the InfoSphere Guardium appliance, where it is determined whether the information is relevant and should be maintained.

• Comprehensive auditing means that

everything is audited and logged. If you are already using database activity monitoring for audit and compliance, some-one with Hadoop expertise may be able to map between the requirements on data-bases and those for Hadoop. For example, permission exceptions in Hadoop are file system permission errors rather than data-base authorization errors.


Implement monitoring

After you get the appliances and S-TAPs installed and connected on the network, all the planning work you did around busi-ness requirements will be beneficial when implementing monitoring.

You will start with the basic building blocks of creating groups and build upon that as follows:

1. Define and populate groups. This in-cludes groups of users, sensitive data objects, applications, server IPs and client IPs.

2. Define a security policy.

3. Customize out-of-the-box auditing reports, or create your own.

Create and populate groups to sim-plify management and maintenance Groups are central to simplifying manage-ment and control of the auditing environ-ment. By classifying users, applications, servers, data objects and more into groups, you can fully take advantage of the flexibility and power of the InfoSphere Guardium system, while also keeping it manageable. Think about some of the following groups:

Privileged users (administrators)

Sensitive objects (files or HBase tables) Applications

Server IPs (this will help with managing

traffic coming from multiple IPs)

Client IPs to help you manage and track

back suspicious activity

Commands (are there certain commands

you want to capture and/or filter out?)


For example, Figure 6 shows how to create a group of authorized programs called MapReduce and sortlines. The new group is named “Hadoop Authorized Job List.” Use the Guardium Group Builder to populate the group with members.

Figure 6. Creating a group of authorized programs


Figure 7 shows a partial report from Clou-dera (CDH4) that includes a query to show activity from any application that is NOT in the authorized job list group. The program PiEstimator has not been added to the authorized job list, and you can see its activity in this report.

There are several options for creating groups. You will probably use several approaches to create and automate the update of these groups, including:

Manual entry by working with application

owners to identify sensitive data objects for specific environments

An API to script the creation of groups

from your own input

Populate from a query using observed

traffic from InfoSphere Guardium

LDAP/Active Directory integration to

import users

The automation process can be scheduled to run on a periodic basis to pick up any new changes in the Hadoop system, such as new users.

Figure 7. Extract of an unauthorized job activity report

Implement monitoring


SVORUGA svoruga PiEstimator job_201209042356_0007 HADOOP PROTOBUFCLIENT PROGRAM

DB User

Name MapReduceUser MapReduceName MapReduce Job SourceProgram


HADOOP PROTOBUF CLIENT PROGRAM PiEstimator job_201209042356_0007


Define a security policy

Policies are sets of rules and actions that direct the operations and behavior of the InfoSphere Guardium system, including which traffic is ignored and which is logged; which activities require more granular log-ging; and, when to prompt real-time alerts. InfoSphere Guardium includes a Hadoop policy that you can customize, as shown in Figure 8. The purpose of the predefined policy rules is to filter out traffic that is not needed for auditing. The policies make use of predefined groups such as Hadoop-SkipObjects. This is the case where you will likely create and modify such groups based on the observed traffic in your system.

You can then add on additional rules such as ignoring trusted sessions, or log the activities of privileged users with more detail. Again, this is where your predefined group of privileged users will help.

Figure 8. Hadoop policy built-in rules


In addition, you can use policies to define real-time alerts. For example, you can create a rule in which an alert is fired when-ever a user from a particular group, such as

a privileged user, attempts to access a sensitive data set that they are not autho-rized to access. This requires creating a group of privileged users and a group of

sensitive data objects. Figure 9 is an example of how this alert will appear on the Guardium Incident Management tab. Alerts can also be sent to email addresses.

Figure 9. Alert on access to sensitive files by a user who is not authorized


Customize reports and create compli-ance automation workflows

Because InfoSphere Guardium stores all information from all monitored sources into a common schema, many existing reports included with InfoSphere Guardium will show valid information for Hadoop, such as session information. InfoSphere Guardium also includes several reports that have already been tailored for Hadoop, including MapReduce activity, detecting unauthorized MapReduce jobs, Hue/Beeswax reports for Hive, HDFS activity, and full details reports. You can customize these reports, or build your own tailored to your own audit process requirements using the robust query building and report building capabilities in InfoSphere Guardium.

InfoSphere Guardium includes workflow capabilities to enable the distribution and signoff of audit reports. Results can be delivered to users, groups of users, or roles. (Using roles is recommended to enable more than one user to review and sign off. Roles also make it easier to manage employee absence and turnover.) Start by:

1. Identifying who should receive reports for what job function (info security manager). 2. Identifying groups of users with the same

job function and grouping them into roles. You can use the predefined roles in

InfoSphere Guardium, or create your own customized roles.

3. Creating users and assigning them to their appropriate roles.

4. Determining how often reports need to be generated.

5. Determining who receives the reports, whether review/signoff is required and whether the delivery should stop at any user or role until they complete the required action.


Operationalize your processes

Operational procedures should be defined for each of the teams that are involved in administering the InfoSphere Guardium environment or in evaluating and acting on monitoring results. Process flows can be very useful in defining responsibilities and the sequence of steps needed to address a particular situation, such as when new users are authorized to the InfoSphere Guardium system, or when policy rules need to change.

Extra emphasis should be given to process-es related to handling security breachprocess-es and forensic investigations. The support team needs to be made aware of the rules and trained on steps to be performed in case a security breach occurs.

Based on the business requirements, daily, weekly, monthly, quarterly and cyclical tasks should be defined and documented. Here is a simplified example of a plan:


The InfoSphere Guardium Administrator:

Verifies archiving/aggregation and backup Follows up on self-monitoring alerts from

the previous night The Audit team:

Performs review of the automated audit

processes set up on the system

Investigates any activity that is not

business as usual

Escalates data security breach attempts


The InfoSphere Guardium Administrator:

Verifies space utilization on the appliance Verifies that data is being logged correctly Verifies that the InfoSphere Guardium

appliance is purging and archiving correctly

Verifies that all scheduled jobs are

executed on time The Audit team:

Meets with the members of the Hadoop



The implementation of an InfoSphere Guardium data activity monitoring solution for Hadoop can help jump start your

organization’s use of Hadoop for enhanced business value. With the correct planning and understanding of your business requirements for monitoring and auditing, InfoSphere Guardium can help you address regulatory requirements and reduce

your risk of data breaches from hackers or insiders.


For more information, please visit ibm.com/guardium.

InfoSphere Guardium Data Security v9

Deliver real-time activity monitoring and automated compliance reporting for Big Data security

Learn more

Big data security and auditing with IBM InfoSphere Guardium

Monitor and audit access for IBM InfoSphere BigInsights and Cloudera Hadoop

Download here

Understanding holistic database security

8 steps to successfully securing enterprise data sources


For more information on managing data-base security in your organization, visit


© Copyright IBM Corporation 2012 IBM Corporation

Software Group Route 100 Somers, NY 10589

Produced in the United States of America December 2012

IBM, the IBM logo, ibm.com, InfoSphere, and Guardium are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml. This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates.



Related documents

Unlike most other modern bullion coins, the American Gold Eagle is 22 karat, .9167 pure gold alloyed with silver and copper for superior durability.. Gold Eagles are a true

It is certainly possible that the answer is yes, and this for four reasons: (1) the ancient Lydians considered electrum as a separate metal 91 and would accept the coins as of equal

IBM InfoSphere Guardium Collector X2000 E0EMSLL December 30, 2018 April 30, 2020 Appliance Appliance Install Annual.. Appliance Maintenance + Subscription and Support Renewal

Primary ascomyceteS (e.g., Hatosphaerioles) evolved directly from a marine ancestor, probably one common to both marine fungi and red algae.. Secondary marine Ascomycetes

Funding: Black Butte Ranch pays full coost of the vanpool and hired VPSI to provide operation and administra- tive support.. VPSI provided (and continues to provide) the

By the 1920s and 1930s, kabuki was mainstream entertainment and the actors portrayed in the prints of Stars of the Tokyo stage were wildly popular for their exciting

The theoretical concerns that should be addressed so that the proposed inter-mated breeding program can be effectively used are as follows: (1) the minimum sam- ple size that

Secondly, we define a unique unit speed non-planar geodesic on the right generalized cylinder whose base curve is the considered plane curve with a constant speed one divided

Blessed are the disciples, fortunate are the disciples, happy are the disciples, for they are part of ushering in the kingdom of God Jesus is talking about.. It’s in

Therefore, in this study a temporal trend of annual temperatures was built with a time series from 1950 to 2010 for Mexicali, Mexico, and estimates of 5- to 100-year return

InfoSphere Guardium has extended its market-leading data activity monitoring solution to include leading-edge platforms, such as Hadoop, to help your organization meet

This position paper discusses ISO 9000 family of standards in terms of its capability assessment value and as a framework for management of organizations.. Uses data collected

With data housed in the repository, the full array of InfoSphere Guardium policy, analysis, reporting and workflow tools can be leveraged: Allows input data from other sources

Access policies with privileged user access controls InfoSphere Guardium Data Encryption complements data traffic controls provided by Guardium Data Activity Monitoring with

Figure 1 : InfoSphere Guardium Vulnerability Assessment lowers total cost of ownership, improves security and supports compliance requirements through a set of core

S-TAPs are lightweight, host-based probes that monitor all database traffic, including local access by privileged users, and relay it to InfoSphere Guardium collector appliances

Unified Solution : Built on a single unified console and back-end data store, InfoSphere Guardium offers a family of integrated modules for managing the entire database security

End to end security In the combined solution, BIG-IP ASM provides security data from the front-end of the application, while InfoSphere Guardium correlates that data with its own

• The access modeling process takes policy inputs, such as existing business policies and information security policies, segregation of duties rules, customer requirements and

The first objective of the study was to compare and contrast the effects o f different soil- incorporated organic and inorganic inputs on soil chemical properties and maize y ield

Through designing comprehensive school health and nutrition programmes that incorporate school feeding and deworming to reduce hunger and improve nutrition as well

On or after a Merger Date or Tender Offer Date (as defined below), the Calculation Agent will either (i) (A) make adjustment(s), if any, to any one or more of the Initial