• No results found

Privacy through Accountability: A Computer Science Perspective

N/A
N/A
Protected

Academic year: 2021

Share "Privacy through Accountability: A Computer Science Perspective"

Copied!
57
0
0

Loading.... (view fulltext now)

Full text

(1)

Privacy through Accountability:

A Computer Science Perspective

Anupam Datta Associate Professor

Computer Science, ECE, CyLab Carnegie Mellon University

(2)
(3)

Research Challenge

Ensure organizations respect privacy expectations in the collection, use, and disclosure of personal

information

(4)

Web Privacy

Example privacy policies:

 Not use detailed location (full IP address) for advertising

(5)

Healthcare Privacy

Hospital Drug Company Patient information Patient Auditor Patient informatio n Patient informatio n Physician Nurse

Example privacy policies:

 Use patient health info only for treatment, payment

(6)

A Research Area

 Formalize Privacy Policies

 Precise semantics of privacy concepts

(restrictions on personal information flow)

 Enforce Privacy Policies

 Audit and Accountability

 Detect violations

 Blame-assignment

 Adaptive audit resource allocation

Related ideas: Barth et al Oakland 2006; May et al CSFW 2006; Weitzner et al CACM 2008, Lampson 2004

(7)

Today: Focus on Detection

 Healthcare Privacy

 Play in two acts

 Web Privacy

(8)

A covered entity may disclose an individual’s protected health information (phi) to law-enforcement officials for the purpose of identifying an individual if the individual made a statement admitting participating in a violent crime that the covered entity believes may have caused serious physical harm to the victim

Example from HIPAA Privacy Rule

 Concepts in privacy policies

Actions: send(p1, p2, m)

Roles: inrole(p2, law-enforcement)

Data attributes: attr_in(prescription, phi)

Temporal constraints: in-the-past(state(q, m))

Purposes: purp_in(u, id-criminal))

Beliefs: believes-crime-caused-serious-harm(p, q, m)

Black-and-white concepts

(9)

Detecting Privacy Violations

Privacy Policy Computer-readable privacy policy Organizational audit log Detect policy violation s Audit Complete formalization of HIPAA Privacy Rule,

GLBA Automated audit for black-and-white policy concepts Oracles to audit for grey

policy concepts The Oracle

The Matrix character

Species Computer Program Title A program designed to

investigate the human psyche.

(10)

Policy Auditing over Incomplete Logs

With D. Garg (CMU  MPI-SWS) and

L. Jia (CMU)

2011 ACM Conference on Computer and Communications Security

(11)

Key Challenge for Auditing

Audit Logs are Incomplete

Future: store only past and current events

Example: Timely data breach notification refers to future event

Subjective: no “

grey

” information

Example: May not record evidence for purposes and beliefs

Spatial: remote logs may be inaccessible

Example: Logs distributed across different departments of a hospital

(12)

Abstract Model of Incomplete Logs

Model

all

incomplete logs uniformly

as 3

-valued structures

Define

semantics

(meanings of

formulas) over 3-valued structures

(13)

reduce: The Iterative Algorithm

reduce (

L

,

φ

) =

φ'

φ

0

φ

1

φ

2 r e d u c e r e d u c e Logs Policy
(14)

Syntax of Policy Logic

 First-order logic with restricted quantification over infinite domains (challenge for reduce)

 Can express timed temporal properties, “grey” predicates

(15)

Example from HIPAA Privacy Rule

∀p1, p2, m, u, q, t. (send(p1, p2, m) ∧ inrole(p2, law-enforcement) ∧ tagged(m, q, t, u) ∧ attr_in(t, phi))

⊃ (purp_in(u, id-criminal))

∧∃ m’. state(q,m’) ∧is-admission-of-crime(m’) ∧believes-crime-caused-serious-harm(p1, q, m’)

A covered entity may disclose an individual’s protected health information (phi) to law-enforcement officials for the purpose of identifying an individual if the individual made a statement admitting participating in a violent crime that the covered entity believes may have caused serious physical harm to the victim

(16)

reduce: Formal Definition

c is a formula for which finite satisfying substitutions of x can

be computed

General Theorem: If initial policy passes a

syntactic

mode check

, then finite

substitutions can be computed

Applications: The entire HIPAA and GLBA

Privacy Rules pass this check

(17)

φ = ∀p1, p2, m, u, q, t. (send(p1, p2, m) ∧ tagged(m, q, t, u) ∧ attr_in(t, phi)) ⊃ inrole(p2, law-enforcement) ∧ purp_in(u, id-criminal) ∧ ∃ m’. ( state(q, m’) ∧ is-admission-of-crime(m’) ∧ believes-crime-caused-serious-harm(p1, m’))

Example

{ p1→ UPMC, p2→ allegeny-police, m → M2, q → Bob, u → id-bank-robber, t → date-of-treatment }

∧ purp_in(id-bank-robber, id-criminal)

{ m’ → M1 } ∧ is-admission-of-crime(M1) ∧ believes-crime-caused-serious-harm(UPMC, M1) Log Jan 1, 2011 state(Bob, M1) Jan 5, 2011 send(UPMC, allegeny-police, M2) tagged(M2, Bob, date-of-treatment, id-bank-robber)

T

(18)

 Implementation and evaluation over simulated audit logs for compliance with all 84 disclosure-related

clauses of HIPAA Privacy Rule

 Performance:

 Average time for checking compliance of each disclosure

of protected health information is 0.12s for a 15MB log

 Mechanical enforcement:

 reduce can automatically check 80% of all the atomic

predicates

(19)

Ongoing Transition Efforts

 Integration of reduce algorithm into Illinois Health Information Exchange prototype

 Joint work with UIUC and Illinois HLN

 Auditing logs for policy compliance

(20)

Related Work

 Distinguishing characteristics

1. General treatment of incompleteness in audit logs

2. Quantification over infinite domains (e.g., messages)

3. First complete formalization of HIPAA Privacy Rule and

GLBA.

 Nearest neighbors

 Basin et al 2010 (missing 1, weaker 2, cannot handle 3)

 Lam et al 2010 (missing 1, weaker 2, cannot handle entire

3)

 Weitzner et al (missing 1, cannot handle 3)

(21)

Formalizing and Enforcing

Purpose Restrictions

With M. C. Tschantz (CMU  Berkeley) and

J. M. Wing (CMU  MSR)

(22)

Goal

 Give a semantics to

“Not for” purpose restrictions

“Only for” purpose restrictions that is parametric in the purpose

Provide audit algorithm for detecting violations

for that semantics

(23)

X-ray taken

Send record

X-ray added

Diagnosis

by specialist

No diagnosis

by drug company

Send record

A

dd x

-ray

Medical

Record

Med records

used only for

(24)

X-ray taken

Send record

X-ray added

Diagnosis

by specialist

No diagnosis by

drug company

Send record

A

dd x

-ray

Not achieve

purpose

Achieve purpose

(25)

X-ray taken

Send

record

X-ray added

Diagnosis

by specialist

No diagnosis

(by drug co. or

specialist)

Send record

A

dd x

-ray

1/4

3/4

Specialist

fails

Choice

point

Best choice

(26)

Planning

Thesis: An action is for a purpose iff that

action is part of a plan for furthering the

purpose

i.e., always makes the best choice for furthering the
(27)

Auditing

Auditee’s

behavior

Purpose

restriction

Decision-making

model

Obeyed

Violated

Inconclusiv

e

(28)

Violated

MDP Solve r

Optimal

actions for

each state

Actions optimal? Policy implications

Record only

for treatment

No

[ , send

record]

(29)

Summary: A Sense of Purpose

Thesis: An action is for a purpose iff that action

is part of a plan for furthering the purpose

i.e., always makes the best choice for furthering the

purpose

Audit algorithm detects policy violations by

checking if observed behavior could have been

produced by optimal plan

(30)

Today: Focus on Detection

 Healthcare Privacy

 Play in two acts

 Web Privacy

(31)

Bootstrapping Privacy Compliance in a

Big Data System

With S. Sen (CMU) and

S. Guha, S. Rajamani, J. Tsai, J. M. Wing (MSR) 2014 IEEE Symposium on Security & Privacy

(32)

Privacy Compliance for Bing

Setting:

(33)

Two Central Challenges

Legal Team Crafts Policy Privacy Champion Interprets Policy Developer Writes Code Audit Team Verifies Compliance 1.

Ambiguous privacy

policy

 Meaning unclear 2.

Huge undocumented

codebases &

datasets

 Connection to policy unclear Meeting s Meeting s Meeting s
(34)

1. Legalease

 Clean syntax

 Layered allow-deny

information flow rules with exceptions  Precise Semantics  No ambiguity  Focus on Usability  User study of Legalease with Microsoft privacy champions promising  Example:

DENY Datatype IPAddress USE FOR PURPOSE

Advertising EXCEPT

ALLOW Datatype IPAddress: Truncated

(35)

2. Grok

Process 1 Dataset A Dataset B Dataset C Dataset F Dataset E Process 2 Process Dataset D Process 5 Dataset J Process Process 4 Dataset H Dataset I Dataset G NewAcct Login Check GeoIP Check Fraud Reportin Name Age IPAddres

s IDX Hash Country Timestam p Hash IDX IDX  Data Inventory  Annotate code + data with policy data types

 Source labels propagated via data flow graph

 Different Noisy Sources  Variable Name Analysis  Developer Annotations

(36)

2. Grok

Dataset F Dataset D Process 5 Dataset J Process Process 4 Dataset H Dataset I Dataset G GeoIP Check Fraud Reportin IPAddres s IDX Country IDX IDX  Example Policy Violation

IPAddress is used for reporting (advertising)

(37)

2. Grok

Dataset F Dataset D Process 5 Dataset J Process Process 4 Dataset H Dataset I Dataset G GeoIP Check Fraud Reportin IPAddres s IDX Country IPAddress IDX IDX  Example Fix IPAddress is truncated before it is passed to

reporting (advertising) job

Dataset F

IPAddress

(38)

Bootstrapping Works

Pick x% most

frequently appearing column names, label them

Then propagate

label using Grok flow Pick the nodes

which will label the most of the graph

~200 annotations label 60% of nodes

A small number of annotations is enough to get off the ground.

(39)

Scale

 77,000 jobs run each day

 By 7000 entities

 300 functional groups

 1.1 million unique lines of code

 21% changes on avg,

daily

 46 million table schemas

 32 million files

 Manual audit infeasible

 Information flow

analysis takes ~30 mins

(40)

A Streamlined Audit Workflow

Legal Team Crafts Policy Privacy Champ Interprets Policy Developer Writes Code Audit Team Verifies Compliance Legalease

A Formal Policy Specification Language

Grok

Data Inventory with Policy Datatypes

Encode Refine

Code analysis, developer annotations

Checker Annotated Code Legalease Policy Potential violations Fix code Update Grok

(41)

Information Flow Experiments

With Michael Carl Tschantz (CMU  UC Berkeley)

Amit Datta (CMU)

(42)
(43)

User

Ads

Search

terms

Other users

Advertisers

Websites

Google

Confounding

inputs

Web Tracking

?

(44)

Control Group

Experimental Design

Scientist

Experimental Group

Drug

Placebo

(45)

Group 2

Information Flow Experiment

Group 1

Arrested?

Black

Looking for?

White

(46)

Google

46

Black

Arrested?

Looking for?

White

Black

Arrested?

Black

Arrested?

Looking for?

White

Looking for?

White

(47)

Information Flow Experiments as Science

Experimental Science Information Flow

Natural process System in question Population of units Subset of interactions

… …

(48)

Browser Instances are Not Independent

17 13 13 13 12 11 10 10 8 7
(49)

Our Idea

 Use a non-parametric test

 Does not require model of Google

 Specifically, a permutation test

(50)

Visiting Car Websites Impacts Ads

0 0 2 5 6 19 22 30 30 31
(51)

Conclusion

 A rigorous methodology for information flow experiments

 Connection to causality in natural sciences

 Experimental design for causal determination

 Significance testing with non-parametric statistics

 Future work

 Replicate and analyze previous experiments

systematically

 Guha et al, Wills and Tatar, Sweeney

 Conduct new large-scale experiments systematically

(52)

52

A Research Area

 Formalize Privacy Policies

 Precise semantics of privacy concepts

(restrictions on personal information flow)

 Enforce Privacy Policies

 Audit and Accountability

 Detect violations

 Blame-assignment

 Adaptive audit resource allocation

 Application Domains

(53)
(54)

Information Flow Analysis

Analysis

White box

Black box

Experimenting Monitoring

Testing

Access to program?

Yes No

Total Partial None

(55)

Google Exhibits Complex Behavior

0 5 10 15 20 25 30 35 40 45 0 50 100 150 200 A d id Reload number 55
(56)

Privacy as Contextual Integrity

Context-relative information flow norms

 Example contexts: healthcare, friendship

 Example norms: confidentiality, purpose, reciprocity

[Nissenbaum 2004; Barth-D-Mitchell-Nissenbaum 2006]

(57)

Norms to Policies

 Example norm: confidentiality expectations in healthcare

 Associated policy: clauses in the HIPAA Privacy Rule

 Does policy reflect norm? Privacy

Norms

Privacy Policies

References

Related documents

Blue Care is committed to providing its clients with the very best of care to improve their quality of life and to provide support to their personal network. If your

• Patient privacy monitoring: 25 of the HIPAA audit protocols • HIPAA Protocol sections. –

The IP-address, that your Browser conveys within the scope of Google Analytics, will not be associated with any other data held by Google.. You may refuse the use of cookies

In connection with the online order functionality available on www.tksimplexoline.com, we will collect certain personally identifiable information, which may include your

We may use the information we collect from you when you register, make a purchase, sign up for our newsletter, respond to a survey or marketing communication, surf the website, or

(8) Flag if support window should be displayed. We also use cookies on our website which enable an analysis of the user's surfing behaviour. This is to prevent crawlers from

Our Practice will provide a patient with a written notice of denial or limitation of access (see Forms section of this Manual) which shall contain: the reason for such denial

However, due to the federal Privacy Rule that was promulgated under the Health Insurance Portability and Accountability Act (HIPAA) (the HIPAA Privacy Rule), there are