Auditing Big Data for Privacy, Security and Compliance

51 

Loading....

Loading....

Loading....

Loading....

Loading....

Full text

(1)

CRISC

CGEIT

CISM

CISA

2013 Fall Conference – “Sail to Success”

Auditing Big Data for

Privacy, Security and Compliance

Davi Ottenheimer @daviottenheimer

Senior Director of Trust, EMC

(2)

Introduction

Davi Ottenheimer (@daviottenheimer)

19

th

Year InfoSec

CISM, Platinum ISACA (1997)

Ex-Big 5 Auditor, Ex-PCI QSA/PA-QSA

Co-Author “Securing the Virtual Environment”

@EMCTrustedIT

Pivotal

Pivotal

VMware

VMware

EMC

EMC

R

S

A

R

S

A

Big

Data

Cloud

Trust

(3)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

Agenda

Big Data

Operations

Auditing

3

(4)

CRISC

CGEIT

CISM

(5)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

The Big Data Dilemma

“We have massive amounts of data. We know

who you are. We know what your history has

been on the airline. We can customize our

offerings.”

- Delta CEO

5 http://m.apnews.com/ap/db_289563/contentdetail.htm?contentguid=InpMBxrL

“Airlines have yet to find the right balance

between being helpful and being creepy.”

(6)

Big Data Definitions

Many Long-Standing Examples

Astronomy (“Billions and billions”)

Meteorology (Storms)

Geology (Quakes)

Anatomy (Disease)

Economics (Fraud)

Espionage (Echelon SIGINT, PRISM…)

Three V’s (Volume, Variety, Velocity)

(7)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

Structured + Unstructured Data = Big

7

Internet of Things

Telemetry, Location-Based, etc.

Non-Enterprise

Structured in Relational Databases

Managed, Unmanaged & Unstructured

(8)

Global Flight Analysis

http://www.spatialanalysis.ca/2011/global-connectivity-mapping-out-flight-routes/

http://www.computerweekly.com/news/2240176248/GE-uses-big-data-to-power-machine-services-business

60,000 Total Routes

1 Tb/day Data Each Gas Turbine Engine

(9)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

What is So

Pinterest

-ing?

Lists

Users you follow

Boards (and related users) you follow

Your followers

People who follow your boards

Boards you follow

Boards you unfollowed after following a user

Followers and unfollowers of each board

9

http://blog.gopivotal.com/case-studies-2/using-redis-at-pinterest-for-billions-of-relationships, http://engineering.pinterest.com/post/55272557617/building-a-follower-model-from-scratch

(10)

What Makes Data “Ready”?

(11)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

Should Users Trust?

11 Credit Score Contacts Behavior Preferences Credit Score Contacts Behavior Preferences Location / Movement Location / Movement Identifier Identifier Connection Relationship Affiliation Connection Relationship Affiliation Picture / Video Picture / Video Browsing History Browsing History Finances Finances https://www.unboundid.com/blog/2013/09/05/the-value-of-identity-data-and-identity-etiquette/

C

u

st

o

m

e

r

C

o

n

ce

rn

(2) What will you do with it?

(1

)

Is

i

t

sa

fe

?

(12)

Should Users Trust?

http://money.cnn.com/2013/09/05/pf/acxiom-consumer-data/index.html

(1

)

Is

i

t

sa

fe

?

“…a 26-year-old mother of two teenagers…is just

about biologically impossible”

“…a 26-year-old mother of two teenagers…is just

about biologically impossible”

(13)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

Finding Value in Data

13

“…we know the estimated numbers of people being

served by each waste water treatment plant, we can

back-calculate daily [drug] loads…”

- Dr Kasprzyk-Hordern

Chicago and suburbs = 1.5b

gal/day wastewater

Disease

Drugs

Environmental Risk

http://gizmodo.com/5844925/chicagos-stickney-wastewater-treatment-plant-is-the-crappiest-place-on-earth, http://planetearth.nerc.ac.uk/news/story.aspx?id=1185&cookieConsent=A http://www.treehugger.com/natural-sciences/fish-near-water-treatment-plants-are-harmed-by-human-drugs.html

(14)

Finding Errors in Data

http://pewinternet.org/Reports/2013/Anonymity-online.aspx, http://www.connecture.com/the-connecture-difference/

of Internet users have

taken steps online to

remove or mask their

digital footprints

(15)

10/3/2013 15

CRISC

CGEIT

CISM

CISA

2013 Fall Conference – “Sail to Success”

OPERATIONS

(16)

Transformation of IT

Mobile

Virtual

Cloud

Big

Data

(17)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

New Wave of Big Data Technology

17

Business

Objectives

Insights

Analytics

SciPy Mahout MATLAB Revolution R SPSS AMPL SAS Machine Learning Behavior Analysis Sentiment Analysis Predictive Models Network Analysis Visualization Simulation

Data

Hadoop Hadoop Vertica MapReduce Esper kdb Pivotal Netezza Teradata ECL ETL Hive

(18)

(Over?) Emphasis on Performance

Nodes Distributed

Data Shared

Access Controls Open

Networks Open

Clients Unauthenticated

(19)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

Hadoop Machine Roles

19

1. Clients

2. Masters

3. Slaves

1. Storage Management (Name Node -> HDFS)

2. Compute Management (Job Tracker -> MapReduce)

1. Storage (Data Node)

2. Compute (Task Tracker)

1. Load Data to Cluster

2. Submit MapReduce Jobs

3. View Results

(20)

Hadoop Machine Roles

Clients

Job Tracker

Name Node

Secondary

Name Node

Data Node &

Task Tracker

Data Node &

Task Tracker

Data Node &

Task Tracker

Data Node &

Task Tracker

Data Node &

Task Tracker

Data Node &

Task Tracker

Slaves

Masters

HDFS

MapReduce

(21)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

Hadoop Cluster

21

switch

switch

switch

switch

name node

job tracker

2ndry name node

client

Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker

Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker

Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker

Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker

switch

Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker

(22)

Typical Cluster Environment

Petabytes

10,000s Slots Per Cluster

Shared or Dedicated

Production

and

Non-Production

(23)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

Google MapReduce: 2004

(24)

MapReduce Today

Output Files

Task/Data

Reduce

HDFS

Block

Task/Data

Reduce

HDFS

Block

Task/Data

Task/Data

Task/Data

Map

Map

Map

HDFS

Block

HDFS

Block

HDFS

Block

Output Files

Splits

Splits

Splits

Splits

Splits

Splits

Splits

Splits

Splits

Task

Job Tracker

JSON

RPC Read

Data

NameNode

(25)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

MapReduce Today

25

Node0

Node1

Local File

System

HDFS

Name and

Data Node

output

tmp

input

MapReduce

Job and

Task Tracker

reduce

map

dir owner perm

name hdfs 700

data hdfs 700

dir owner perm

HDFS hdfs 775

MAPRED mapred 775

(26)

Job Perimeter

Production

Ad-Hoc

User Accounts

Ticket at Login

Tickets Expire

Batch Accounts

Tickets from Keytab

(27)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

Domain Perimeter

27

Name Node

Data Node &

Task Tracker

Data Node &

Task Tracker

AD

AD

Hadoop Authentication

IT Auth

LDAP

SSH

hdfs/nn@company

hdfs/nn@hadoop

(28)

Domain Perimeters

Job Tracker

Name Node

Data Node &

Task Tracker

Data Node &

Task Tracker

AD

AD

Bastion

Hadoop Authentication

IT Auth

(29)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

Authentication Perimeters?

29

NN

WebHDFS

Client

SNN

Service Users

Users

HDFS Client

MR Client

JT

DN / TT

DN / TT

DN / TT

HTTPFS

Map

Task

Reduce

Task

usernames: hdfs, httpfs, mapred

end user

(30)

Authentication Perimeters?

Services

Clients

service user

end user

Clients

Hadoop

Zookeeper

Hue

Oozie

Flume

Impala

Hbase

Hive

Metastore

Hbase

Oozie

www

Flume

Impala

Zookeeper

WebHdfs

Pig

Crunch

Cascading

Sqoop

Hve

Map

Red

RPC

HTTP

HTTP

HTTP

Avro

RPC

Thrift

RPC

RPC

Thrift

RPC

(31)

2013 Fall Conference – “Sail to Success”

September 30 – October 2, 2013 31

(32)

CRISC

CGEIT

CISM

(33)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

Big Data Security Today

33

Authentication

Authorization

Encryption

Policy: “Cluster accessible only to trusted personnel”

Caveats

All or Nothing: Nodes without security cannot communicate with secure nodes

(34)

Authentication

End User to Service

Service to Service

Service to Service, for a User

Job Task to Service, for a User

“Big Data Security Configuration is a PITA. Do only

what you really need.”

(35)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

Authorization

Data

HDFS, Hbase, Hive Metastore, Zookeeper

Jobs

Hadoop, MR, Pig, Oozie, Hue…

Queries

Impala, Drill

35 https://hadoop.apache.org/docs/stable/Secure_Impersonation.html

(36)

Encryption

Stored Data (Emerging)

Data Nodes Only

Keys Stored Separate from Data Node Storage

Keys for “Secure Process” Not Users

Transmitted Data

RPC Supports Only SASL

(37)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

Typical “Security” Setup

Access Controls

1. Authentication (Kerberos)

2. Role Based Authorization (ACLs)

3. Identity Management (AD or LDAP)

(38)

Strong Authentication?

user

process

hdfs

namenode, datanode, secondary namenode

mapred

jobtracker, tasktracker, child tasks

group

users

hadoop

hdfs, mapred

$ sudo -u hdfs hadoop fs -chmod -R 1777 /tmp

(39)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

Trying to Get to Compliance

1. Node Access (Authentication)

2. Node API Authentication

3. Store Data (Disk Encryption)

4. Transmit Data (Net Encryption)

5. RBAC (Job ACLs)

6. Logs

E

a

se

MitM?

root?

NIC control?

sudo -u hdfs hadoop fs -rmr /

39

(40)

Threat Model Thoughts

Confidentiality

Regulated Data

Intellectual Property

Integrity

Analysis

Output

Availability

Data Load

Jobs, Analysis

(41)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

Future

41

Business

Objectives

Insights

Analytics

Data

Confidentiality

Integrity

Availability

(42)

Regulated Data Example

Confidentiality

Stored

Transmitted

Integrity

Backup

Restore

Availability

Cost Calculations

Error

Load Time

Drop Time

Re-Load Time

(43)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

Regulated Data Example: Sqoop2

43

Bulk Data Transfer Tool – Import/Export

https://blogs.apache.org/sqoop/entry/apache_sqoop_highlights_of_sqoop

“Sqoop2 Does Not

Support Security”

(44)

Future Directions

Building Controls for Risks

Data

Acquire

Store

Query

MapReduce

Pig (simple query language)

Hive (SQL queries)

Cascading (workflow)

Mahout (machine learning)

Hama (scientific compute)

Drill (ad-hoc query)

Hbase (column-orient DB)

Zookeeper (coordination)

(45)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

Acquire

Data Validation

Aggregation

DOB

Age Group (18-24, 25-36, etc.)

Salt (Noise)

Imitation (Synthetic)

Replacement (Swapping)

Wiping (Suppression)

Data

Acquire

45

(46)

Store

Encryption

Unique Users (RBAC)

API Authentication

“Hardened Clusters”

Logs

Forensics

(47)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

Query

Encryption

Unique Users (RBAC)

Load Restrictions

Add, Remove, Change Jobs

Job Alerts

Scheduling

Secure Code Practices / Review

Data

Query

(48)

CSA “Securing Big Data”

SQL Security – Old

NoSQL Security – New

Config Management

Cell-Level Access Labels

Multi-Factor Authentication

Keberos-Based Authentication

Data Classification

ACLs

Data Encryption

Consolidated Audit/Report

Database Firewall

(49)

2013 Fall Conference – “Sail to Success” September 30 – October 2, 2013

Example: Accumulo

NoSQL (HBase) perf…with security

Timeline

2006 – Google BigTable

2008 – NSA

2011 – accumulo.apache.org

Cell-Level Access – Distributed Key/Value Store

// specify which visibilities we are allowed to see

Authorizations auths =

new

Authorizations("

public

");

Scanner scan =

conn.createScanner("

table

", auths);

scan.setRange(

new

Range("

alexander

","

snowden

"));

scan.fetchFamily("

attributes

");

for

(Entry<Key,Value> entry : scan) {

String row = entry.getKey().getRow(); Value

value = entry.getValue();

}

(50)

Conclusions

Big Data

Performance Pressure

Customer Concern / Backlash

Operations Risks

Perimeter Holes

Business Logic Error

Weak Policies

Auditing

Trusted Personnel

(51)

CRISC

CGEIT

CISM

CISA

2013 Fall Conference – “Sail to Success”

Auditing Big Data for

Privacy, Security and Compliance

Davi Ottenheimer @daviottenheimer

Senior Director of Trust, EMC

In-Depth Seminars – D21

Figure

Updating...

References

Updating...

Related subjects :