Five Keys to Big Data Audit and Protection WHITEPAPER

(1)

(2)

1_{http://www.gartner.com/newsroom/id/2848718}

Introduction

Driven by the promise of uncovering valuable insights that enable better, fine-tuned decision making, many businesses are planning – if not already making – substantial investments in big data infrastructure and solutions. As part of these investments, organizations should pay close attention to the need not only to protect the sensitive data stored in resulting big data environments, but also to ensure these environments comply with applicable regulations for data security and privacy.

To help IT security and compliance teams select an ideal data-centric audit and protection (DCAP) solution for their big data environments, this paper identifies five key requirements for evaluating candidate offerings. It also explains how Imperva SecureSphere addresses each of these requirements to deliver a unified solution that enables enterprises to support not only a growing portfolio of big data assets, but their traditional database environments as well.

Big Data in the Enterprise

Organizations are steadily investing in big data technologies – a 2014 Gartner survey found that 73 percent had already invested in big data or planned to do so within 24 months1_{– because of}

the extensive and compelling opportunities they offer. A small sample of real-world use cases for big data solutions include:

• Fusing demographic, buying habit, and real-time location data to deliver individually-tailored retail offers;

• Leveraging extensive real-time sensor data in conjunction with historical maintenance records and industry-wide failure rates to optimize maintenance, service, and lease plans for complex, high-value machinery;

• Combining countless variables – including highly accurate, real-time, micro-focused weather forecasts – to dynamically, fine-tune staffing and supply chain strategies; and

• Bringing together diverse sets of otherwise disconnected data to uncover unforeseen relationships that have the potential to transform the business, for example, by

progressively improving operational efficiency or identifying new, greenfield opportunities in the marketplace.

“Nearly 40,000 organizations running MongoDB, a NoSQL high

performance, and cross-platform document-oriented database, are

found to be unprotected and vulnerable to hackers.”

The Hacker News February 11, 2015

(3)

2_{Best Practices for Securing Hadoop, Gartner, April 2014}

Big data requires security

and compliance too

Given the nature of the insights typically being sought, it is inevitable that organizations’ big data deployments will involve the creation, manipulation, and storage of sensitive information, such as corporate financials, private customer data, trade secrets, and other intellectual property. As such, they are subject to the same compliance mandates (e.g., HIPAA, PCI, and SOX) and require the same protection against breaches as is commonly put in place for traditional databases and their associated applications and infrastructure. In other words, providing coverage for big data is becoming an increasingly relevant and important consideration when it comes to an organization’s DCAP strategy.

Big Data or Big Target?

Establishing adequate audit and protection for big data is easier said than done. Besides being a high-value target – primarily due to the variety and volume of sensitive information involved – big data comes with its own set of challenges. The issue is not that big data security is fundamentally different from traditional data security. In fact, the same core capabilities and security mechanisms remain applicable, including access control, threat filtering, activity monitoring, and alerting. It is just that the environment to which these controls are being applied is different. Understanding the implications of these differences is essential to selecting, deploying, and implementing a truly effective solution.

“Through 2016, fewer than 30% of Hadoop deployments will be

secured and governed in accordance with the enterprise’s information

governance standards.” – Gartner

2

Big Data Security Taking a Backseat Too

Big data environments may differ from traditional ones in many ways, but they also share one dubious similarity. As with other high-profile technology solutions in the past, it appears that many organizations are eschewing the best practice of designing security in from the outset, leaving it instead to be sorted out at a later date.

Significant differences between big data and traditional data environments center around the nature of the data itself, as well as the technology used to store, manage, and manipulate it.

(4)

Chukwa

Monitoring and log collection analysis

Hadoop MapReduce

Distributed processing framework

HDFS

Hadoop distributed file system

R

Statistics

Sqoop

Rela

tional DB

da

ta collector

Oozie Workflow

Flume

Log da

ta collector

ZooK

eeper

Coor

dina

tion

Pig

Data flow

Data warehouse

Hive

HBase

Distributed

table store

Ambari

Provisioning, managing and monitoring

First up are the three V’s:

• The volume of data drives the need for solution scalability that is at least an order of magnitude beyond that for traditional data environments.

• The velocity of data – or the rate at which new data is being accumulated – drives the need for greater speed of a solution. Data parsing and collection throughput, the degree of automation that is available, and the ability to deliver real-time visibility of policy violations and other events now require greater scrutiny.

• The variety of data – in particular, the ability to mix multiple sources and types of data with different access permissions – compounds classification and policy setting challenges, thereby elevating the need for robust audit capabilities.

Then there are the differences with associated infrastructure and technology. To begin with, a big data environment is not defined by a single component, like an RDBMS. Rather, it often entails a multi-layer architecture. The open source Hadoop framework is a good example, with different layers of the stack serving a variety of purposes, from distributed storage at the bottom to table and schema management, distributed programming, and querying/interface options at the middle tiers, and a wide range of management tools along the top/exterior (see Figure 1). The implication is that, with a big data environment, there is not just one logical point of entry or resource to guard, but many, each with an independent lifecycle.

(5)

Similarly, there is not just one underlying technology to account for, but many. Based on the purposes they are intended to serve, different big data environments will involve different technologies for data storage and retrieval. For example, it is not uncommon for an implementation to include either or both relational stores and query tools to support analytical workloads/purposes and non-relational technologies – also known as NoSQL technologies – for real-time, interactive workloads.

The differences are conflated, too, by multiple instances or versions of the same core building blocks being available from different vendors – such as different Hadoop distributions and different NoSQL offerings. The net result is a considerable amount of diversity and complexity that needs to be addressed, if not transparently by the associated security tools, then manually by an organization’s security staff.

Finally, it is necessary to keep in mind that big data environments, essentially by definition, entail distributed data storage and processing. Instead of a relatively simple, local cluster of data repositories, big data deployments typically have a multitude of geographically distributed data stores and, therefore, numerous physical nodes requiring protection. This situation inherently increases the potential for inconsistent security policies and practices, suggesting the need for solutions that feature strong, centralized administration capabilities.

Selecting the Right DCAP Solution

for Big Data

Considering the unique challenges discussed above, the following sections describe five key requirements IT security and compliance teams can use to evaluate audit and protection solutions for their organization’s big data environment(s). Each section also includes details on how the Imperva SecureSphere solution for big data security and governance addresses the corresponding requirement.

5 Keys to big data audit

and protection

#1 Superior scalability

#2 High performance

#3 A fully unified solution

#4 Big data centricity without big data complexity

#5 Enterprise-class feature set

(6)

Requirement #1: Superior scalability

With the number of data repositories at the average organization running into the hundreds or even thousands, evaluators need to pay particular attention to the scalability of any solutions they are considering. Poor scalability has the potential to negatively influence everything from time to value, performance, and adaptability of a solution to the cost of product licenses, hardware, daily operations, maintenance, and support.

SecureSphere features that enable superior scalability include:

A multi-tier architecture. SecureSphere matches the distributed, high node count nature of big data environments with a combination of per-node agents, high-capacity data collection and policy evaluation gateways, and unified management and reporting servers (see Figure 2). The use of big data techniques for collecting and processing audit data. Compared to solutions that rely on relational database techniques for handling data, the ability of SecureSphere to use indexed flat-files and take advantage of other big data methods allows it to operate with anywhere from 2-5X fewer data collection gateways.

Native clustering for data collection gateways. With SecureSphere, organizations avoid having to purchase and deploy standalone load balancers to scale the collection and processing capacity for audit data. They also avoid having to live with solutions that employ undesirable practices to recover a measure of scalability – such as abbreviating monitoring records or otherwise reducing functionality, both of which could result in missed security or compliance events.

SecureSphere Operations Manager (SOM). A manager-of-managers, SOM helps overcome the “disconnected islands” problem that plagues some solutions. Maintaining consistent big data security policies and creating unified, enterprise-wide compliance reports remains straightforward even for very large implementations.

Figure 2: A Highly Scalable, Multi-Tier Architecture for Big Data Audit and Protection

SecureSphere Management Server (MX) SecureSphere Gateway Physical or Virtual Uniﬁed Management and Reporting MongoDB Cluster

with Imperva Agents

IBM InfoSphere BigInsights, Hortonworks, Cloudera Hadoop Cluster

with Imperva Agents Relational Databaseswith Imperva Agents

Users

Supported Big Data Services NoSql

Hive Hbase HDFS

(7)

Requirement #2: High performance

Not entirely independent from the need for high scalability is the need for high performance. With security especially, delays in obtaining information – for example, about policy violations, abuse of access rights, or suspicious activities – are never good, as they invariably increase exposure periods and, therefore, the risk to the organization.

Notable architecture features and capabilities that enable SecureSphere to keep pace with the velocity aspect of big data environments include:

• Using agents to monitor activities directly from within big data components (rather than agents that parse the protocol from outside the big data components and write to relational databases).

• Using agents that are lightweight and optimized for efficiency – for example, because they are designed not only to intelligently aggregate and selectively forward audit events, but also not to write to local disk under normal operating conditions.

• Providing real-time access to monitoring data and reports, rather than being constrained by an architecture that necessitates batching and periodic aggregation of audit data.

• Having numerous opportunities for automation, facilitated in part by a rich API that not only enables bulk configuration of SecureSphere components, but also supports rapid integration with the rest of an organization’s security, compliance, IT provisioning, identity management, and ticketing systems.

Requirement #3: A fully unified solution

Enterprises can ill afford to purchase, deploy, operate, and maintain multiple solutions to audit and protect their data environments. Disadvantages of such an approach include not only higher TCO and lower operational efficiency, but also inconsistent policy enforcement and compliance. What enterprises need instead is a unified solution, like SecureSphere. SecureSphere provides coverage for an extensive set of both relational and big data platforms, including: Oracle (including ASO/SSL), Oracle Exadata, Microsoft SQL Server, IBM DB2 (on Linux, UNIX, Windows, z/OS and DB2/400), IBM IMS on z/OS, IBM Informix, IBM Netezza, SAP Sybase, Teradata, Oracle MySQL, PostgreSQL, Progress OpenEdge, Cloudera Enterprise (multiple components), Hortonworks (multiple components), and MongoDB. It is also the only enterprise-class data monitoring and protection solution with support for the AWS environment.

The net result is immediate, out-of-the-box applicability for the broadest range of enterprise use cases, scenarios, and data infrastructure, both now and as your organization’s needs change going forward.

In addition, SecureSphere eliminates the need to invest in separate products for security and compliance, providing everything enterprises need to meet both sets of objectives in a single, integrated solution.

Finally, robust centralized administration enabled by SecureSphere Management Server (MX) and SecureSphere Operations Manager ensures enterprise-wide policy, monitoring, and audit consistency across traditional and big data environments alike.

(8)

Requirement #4: Big data centricity, without big data complexity

An ideal audit and protection solution should deliver granular functionality across multiple big data platforms. However, configuring, running, and obtaining value from it should not depend on retaining a team of big data scientists or platform-specific experts.

SecureSphere helps enterprises overcome the diversity and complexity of big data

technologies by building subject matter expertise directly into the solution. With SecureSphere, an integral abstraction layer provides administrators with a single, uniform approach for policy definition and management. Behind the scenes, it also translates this common language to transparently account for the different commands, protocols, mechanisms, and constructs used by different big data platforms and technologies. The result is a granular yet consistent set of big data audit and protection capabilities that is easy to use.

Including pre-defined reports that directly support big data technologies/components is another way that SecureSphere delivers big data centricity (and value) without big data complexity.

Requirement #5: An enterprise-class feature set

Superior scalability, high performance, and unified support for both traditional and big data environments are essentially meaningless in the absence of a rich set of enterprise-class features and capabilities.

A common deficiency of native audit and protection tools is that they provide only limited functionality (see A Word About Native Tools). In contrast, Imperva delivers a tightly integrated solution that enables enterprise security, compliance, and big data administration teams to:

• Continuously monitor and audit all access to sensitive data.

• Uncover unauthorized access and fraudulent activity by maintaining baselines of normal usage patterns and transactions and then flagging any deviations that are observed. • Alert and respond to attacks and unauthorized activities in real time.

• Stop targeted attacks and other advanced cyber threats through out-of-the-box integration with leading anti-malware solutions.

• Accelerate incident response and forensic investigations with advanced techniques for visualizing and analyzing detected events.

(9)

9

imperva.com

trademarks of Imperva, Inc. and its subsidiaries. All other brand or product names are trademarks or registered trademarks of their respective holders. WP-5-keys-to-Big-Data-1115-rev1

• Rapidly configure data protection policies and associated compliance reports by taking advantage of pre-defined rule sets and templates for a wide variety of applications, data repositories, and regulations.

• Automate reporting and compliance activities across both traditional and big data environments.

• Achieve non-stop, high availability coverage through a combination of native clustering and backward compatibility of all components across updates.

• Simplify solution maintenance and currency by taking advantage of automated update feeds for various types of DCAP content, including pre-defined policies, best-practice configuration guidelines, and report templates.

The result is a powerful solution that obviates the need to resort to a disparate collection of standalone tools and native utilities to address all of an organization’s big data audit and protection objectives.

A Word About Native Tools

Similar to the situation with traditional databases, native DCAP tools for big data platforms often have a number of significant disadvantages. Common deficiencies include:

• Poor support for separation of duties, as administrators also retain access to and control over the associated audit and protection capabilities.

• Narrow focus, as coverage is only provided for a single technology/platform per tool. • Less efficient agents/designs that have a greater performance impact on monitored nodes. • Reduced features sets and/or functionality that is less mature.

It is also important to keep in mind the motivations of the parties providing these tools. Are they really focused on delivering a best-of-breed data-centric audit and protection solution? Or is the goal simply to deliver a solution that is good enough to help them sell more big data infrastructure?

Conclusion

Big data promises to enable better, fine-tuned business decisions that ultimately allow companies to save or make more money. Preserving this promise depends, however, on recognizing and doing something about the fact that big data environments require the same protection and are subject to the same compliance mandates as traditional database infrastructure. A data-centric audit and protection solution that fully addresses the key requirements discussed herein – such as Imperva SecureSphere – not only overcomes the unique scalability, speed, diversity, and complexity challenges associated with big data technologies, but also delivers the unified coverage needed to simultaneously secure both traditional and big data environments.

To learn more about SecureSphere and other Imperva solutions for protecting your organization’s data, applications, and reputation, please visit imperva.com.