Securing Hadoop Data
Big Data Everywhere - Atlanta
© 2015 Voltage Security, Inc.
A History of Excellence
• Company: Founded in 2002
Out of Stanford University
Based in Cupertino, California
• Mission: To protect the world’s most
sensitive data
• By: Providing encryption and tokenization solutions that protect the data wherever it is used or stored
• Market Leadership: over 1,100 customers including
– Certified Technology Partner with MapR and other leading distributions
– PCI solutions and data protection used by 7 of the top 9 U.S. payment
processors, and 7 of 10 top US Banks
– Leading Retailers, Airlines, Healthcare Networks, Government Agencies
– Contribute technology to top tier standards organizations – NIST, ANSI, IEEE,
IETF, ISOP
Attack Trends vs. Protection Strategy
Effectiveness
Graph source: Verizon Data Breach Report 2014
Data-centric security protects data over its lifecycle vs. broad threats. Data-at-rest solutions only protect from physical threats
Why is Securing Hadoop Difficult?
• Multiple sources of data from
multiple enterprise systems, and real-time feeds with varying (or unknown) protection
requirements
4
• Rapid innovation in a well-funded
open-source developer community
• Multiple types of data combined
together in the Hadoop “data lake”
Why is Securing Hadoop Difficult?
• Reduced control if Hadoop
clusters are deployed in a Cloud environment
• Automatic replication of
data across multiple nodes once entered into the HDFS data store
• Access by many different
users with varying analytic needs
Existing Ways to Secure Hadoop
• Existing IT security:
– Network firewalls
– Logging and monitoring
– Configuration management
6
• Enterprise-scale security for
Apache Hadoop
– Apache Knox: Perimeter security
– Kerberos: Strong authentication
– Apache Argus: Monitoring and
Management
Need to augment these with “data-centric” protection of data in use, in motion and at rest
Introducing “Data-Centric” Security
Storage File Systems
Databases Data & Applica>ons
Traditional IT Infrastructure Security Disk encryp;on Database Encryp;on SSL/TLS/Firewalls Security Gap Security Gap Security Gap Security Gap SSL/TLS/Firewalls Authen;ca;on Management Middleware Threats to Data Malware, Insiders SQL Injec;on, Malware Traffic Interceptors Malware, Insiders Creden;al Compromise Data Ecosystem Da ta S ecu rit y Co vera ge Security Gaps
Introducing “Data-Centric” Security
Storage File Systems
Databases Data & Applica>ons
Traditional IT Infrastructure Security Disk encryp;on Database Encryp;on SSL/TLS/Firewalls Security Gap Security Gap Security Gap Security Gap SSL/TLS/Firewalls Authen;ca;on Management Middleware Threats to Data Malware, Insiders SQL Injec;on, Malware Traffic Interceptors Malware, Insiders Creden;al Compromise Data Ecosystem Da ta S ecu rit y Co vera ge Security Gaps Voltage Data- centric Security End-‐to-‐e nd Da ta Pro tec>o n
Ideal Solution –
Secure
data yet
Useful
data
Data secure in storage Data stays secure in low-‐trust processes – analy;cs, test, development etc
Selec>ve Live data elements
available to trusted users or BI tools under policy control Data secured during capture
Data stays secure in
Format-Preserving Encryption
10 AES FPE 345-‐753-‐5772 8juYE%Uks&dDFa2345^WFLERGFirst Name: Gunther
Last Name: Robertson
SSN: 934-‐72-‐2356
DOB: 20-‐07-‐1966
First Name: Uywjlqo Last Name:
Muwruwwbp SSN: 253-‐67-‐2356 DOB: 18-‐06-‐1972 Ija&3k24kQotugDF2390^32 0OWioNu2(*872weW Oiuqwriuweuwr%oIUOw1@
• Supports data of any format: Name, address, dates, numbers, etc.
• Preserves referen;al integrity, data meaning e.g. date ranges
• Used for produc;on data, analy;c data or test data cases
• Recognized by NIST (Na;onal Ins;tute of Standards and Technology –
NIST SP800-‐38G
Tax ID
Data Protection with FPE (AES
FFX
) and SST
• Enables large amounts of sensitive data to be “de-identified” in Hadoop
• Majority of analysis, MapReduce jobs, etc. can occur on de-identified data
• Reduces insider threats and improves compliance
• Enables developers to test without exposure
• Enables Hadoop and cloud adoption
FP
E
FP
E SST* FPE FPE
Name SS# Credit Card # Street Address Customer ID
James Potter 385-‐12-‐1199 37123 456789 01001 1279 Farland Avenue G8199143 Ryan Johnson 857-‐64-‐4190 5587 0806 2212 0139 111 Grant Street S3626248 Carrie Young 761-‐58-‐6733 5348 9261 0695 2829 4513 Cambridge Court B0191348 Brent Warner 604-‐41-‐6687 4929 4358 7398 4379 1984 Middleville Road G8888767 Anna Berman 416-‐03-‐4226 4556 2525 1285 1830 2893 Hamilton Drive S9298273
Name SS# Credit Card # Street Address Customer ID
Kwfdv Cqvzgk 161-‐82-‐1292 37123 48BTIR 51001 2890 Ykzbpoi Clpppn S7202483 Veks Iounrfo 200-‐79-‐7127 5587 08MG KYUP 0139 406 Cmxto Osfalu B0928254 Pdnme Wntob 095-‐52-‐8683 5348 92VK DEPD 2829 1498 Zejojtbbx Pqkag G7265029 Eskfw Gzhqlv 178-‐17-‐8353 4929 43KF PPED 4379 8261 Saicbmeayqw Yotv G3951257 Jsfk Tbluhm 525-‐25-‐2125 4556 25ZX LKRT 1830 8412 Wbbhalhs Ueyzg B6625294
12
Options for Securing Data in Hadoop
Applica;ons, Analy;cs & Data Applica;ons & Data Applica;ons & Data Hadoop Jobs Hadoop Jobs & Analy;cs Hadoop Jobs & Analy;cs ETL & Batch Landing Zone Egress Zone ETL & Batch Hadoop Cluster Enrichment, processing, and governance Interactive Impala, Drill,… Streaming Spark Streaming Storm Batch MapReduce, Spark, Hive, Pig… MapR-DB MapR-FS
MapR Data Platform
Storage Encryp;on Applica;ons, Analy;cs & Data
Applica;on with Voltage Interface Point Unprotected Data De-‐Iden;fied Data Legend: Standard Applica;on Applica;ons & Data
BI Tools & Downstream Applica>ons Source Data &
Applica>ons
Distribu>on including Apache Hadoop
Use Case 1: Global Telecommunication Co.
§ § § § § Need § § § § § § Solu>onUse Case 2: Health Care Insurance Company
14 § § § § Need § § § § § Solu>onUse Case 3: Global Financial Services
Company
§ § § § § Need § § § § Solu>onConsiderations
• Multi-platform enterprises adopting a data lake
architecture need a cross-platform solution for protection of sensitive data
• The open source community has invested in building
enterprise grade security for Apache Hadoop, with core capabilities for perimeter security, authentication,
authorization and auditing
• Add data-centric security to protect data at rest, in use
and in motion, and maintain the value of the data for analytics
• Together these enable comprehensive security,
compliance, and rapid and successful Hadoop adoption!
Questions?
© 2014 Voltage Security, Inc. Confidential
http://www.voltage.com/hadoop/