• No results found

Dawn Song

N/A
N/A
Protected

Academic year: 2021

Share "Dawn Song"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

1

Privacy and Anonymity

Dawn Song

[email protected]

Slides credit: John Mitechell & Vitaly Shmatikov

2

Anonymous web browsing

Why?

1. Discuss health issues or financial matters anonymously

2. Bypass Internet censorship in parts of the world

3. Conceal interaction with gambling sites

4. Law enforcement

Two goals:

Hide user identity from target web site: (1), (4)

Hide browsing pattern from employer or ISP: (2), (3)

Stronger goal: mutual anonymity (e.g. remailers)

3

Current state of the world I

ISPs tracking customer browsing habits:Sell information to advertisers

Embed targeted ads in web pages (1.3%)

»Example: MetroFi (free wireless)

[Web Tripwires: Reis et al. 2008]

Several technologies used for tracking at ISP:NebuAd, Phorm, Front Porch

Bring together advertisers, publishers, and ISPs »At ISP: inject targeted ads into non-SSL pages

Tracking technologies at enterprise networks:Vontu (symantec), Tablus (RSA), Vericept

4

Current state of the world II

EU directive 2006/24/EC: 3 year data retentionFor ALL traffic, requires EU ISPs to record:

»Sufficient information to identify endpoints (both legal entities and natural persons)

»Session duration

»… but not session contents

Make available to law enforcement »… but penalties for transfer or other access to data

For info on US privacy on the net:

“privacy on the line” by W. Diffie and S. Landau

5

Part 1: network-layer privacy

Goals:

Hide user’s IP address from target web site Hide browsing destinations from network

6

1

st

attempt: anonymizing proxy

HTTPS:// anonymizer.com ? URL=target User1 User2 User3 anonymizer.com Web1 Web2 Web3 SSL HTTP

(2)

7

Anonymizing proxy: security

Monitoring ONE link: eavesdropper gets nothingMonitoring TWO links:

Eavesdropper can do traffic analysisMore difficult if lots of traffic through proxy

Trust: proxy is a single point of failureCan be corrupt or subpoenaed

»Example: The Church of Scientology vs. anon.penet.fi

Protocol issues:

Long-lived cookies make connections to site linkable

8

How proxy works

Proxy rewrites all links in response from web siteUpdated links point to anonymizer.com

»Ensures all subsequent clicks are anonymized

Proxy rewrites/removes cookies and some HTTP headers

Proxy IP address:

if a single address, could be blocked by site or ISPanonymizer.com consists of >20,000 addresses

»Globally distributed, registered to multiple domains

»Note: chinese firewall blocks ALL anonymizer.com addresses

9

2

nd

Attempt: MIX nets

Goal: no single point of failure

10 Epk2( R3, Epk3( R6,

MIX nets

[C’81]

Every router has public/private key pairSender knows all public keys

To send packet:

Pick random route: R2 → R3→ R6→ srvrPrepare onion packet:

R3 R5 R4 R1 R2 R6 Epk6( srvr , msg) msg srvr packet = 11 Eavesdropper’s view at a single MIX

Eavesdropper observes incoming and outgoing traffic

Crypto prevents linking input/output pairsAssuming enough packets in incoming batch If variable length packets then must pad all to max len

Note: router is stateless user1 user2 user3 Ri batc h 12

Performance

Main benefit:

Privacy as long as at least one honest router on path

Problems:

High latency (lots of public key ops) »Inappropriate for interactive sessions

»May be OK for email

No forward security

Homework puzzle: how does server respond?hint: user includes “response onion” in forward packet

R3

R2 R6

(3)

13

Web-based user tracking

Browser provides many ways to track users: 1.3rdparty cookies ; Flash cookies 2.Tracking through the history file 3.Machine fingerprinting

14

3

rd

party cookies

What they are:

User goes to site A. com ; obtains pagePage contains <iframe src=“B.com”>Browser goes to B.com ; obtains page

HTTP response contains cookieCookie from B.com is called a 3rdparty cookieTracking: User goes to site D.com

D.com contains <iframe src=“B.com”>B.com obtains cookie set when visited A.com

B.com knows user visited A.com and D.com

15

Can we block 3

rd

party cookies?

Supported by most browsers

IE and Safari: block set/write

Ignore the “Set-Cookie” HTTP header from 3rdparties

Site sets cookie as a 1stparty; will be given cookie when contacted as a 3rdpartyEnabled by default in IE7

Firefox and Opera: block send/read

Always implement “Set-Cookie” , but never send cookies to 3rdparty

Breaks sess. mgmt. at several sites (off by default)

16

Effectiveness of 3

rd

party blocking

Ineffective for improving privacy

3rdparty can become first party and then set cookieFlash cookies not controlled by browser cookie policy

Better proposal:

Delete all browser state upon exitSupported as an option in IE7

17 1

Tracking through the history file

E.g., site checks hyper-link color for historyApplications:

Context aware phishing: » Phishing page tailored to victimMarketing

Use browsing history as 2ndfactor authentication

18 1

Context-aware Phishing

Stanford students see:

(4)

19 1

SafeHistory/SafeCache

[JBBM’06]

Define Same Origin Policy for all long term browser state

history file and web cache

Firefox extensions: SafeHistoryand SafeCache

Example: history

Color link as visited only when site can tell itself that user previously visited link:

»A same-site link, or

»A cross-site link previously visited from this site

20

Machine fingerprinting

Tracking using machine fingerptingsUser connects to site A.com

Site builds a fingerprint of user’s machineNext time user visits A.com, site knows it is

the same user

21

Machine fingerprints

[Khono et al.’05] • Content and order of HTTP headers

ƒe.g. user-agent header:

Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14

Javascript and JVM can interrogate machine properties:

Timezone, local time, local IP address

TCP timestamp: exploiting clock skew

TCP_timestamp option: peer embeds 32-bit time in every packet header. Accurate to ≈100ms

fingerprint = (real-time ∆ between packets) (timestamp ∆ between-packets)

22

Administravia

Office hour on Tue changed to 1-3pm due to schedule conflict

Additional office hour on Thu/Fri afternoon to help students preparing the final

In-class final: Dec 10, 306 SodaFinal review on Wed

Guest lecture on Mon: real-world experiences about breaking security systems

23

De-anonymizing data

24

Problem statement

An organization collects private user dataWishes to make data available for researchIndividual identities should be hidden

Examples:

Search queries over a 3 month period (AOL)Netflix movie rentals

Census data

(5)

25

Incorrect approach

Replace “username” or “userID” by random value

Dan → a56fd863ec

John→ 87649dce63

Same value used for all appearances of userID

Problem: often data can be de-anonymized by combining auxiliary information

Examples: AOL search data census data

26

Netflix Prize Dataset

Released in October 2006 to support research on better recommender algorithms

Real movie ratings of 500,000 Netflix subscribers »10% of all Netflix users as of late 2005

Names removedMaybe perturbed

Good target for real-world de-anonymizationWhat can you learn from someone’s movie ratings?What to use as source of external information?

27

Netflix’s Take on Privacy

Even if, for example, you knew all your own

ratings and their dates you probably couldn’t

identify them reliably in the data because only a small sample was included (less than one-tenth of our complete dataset) and that data was subject to

perturbation. Of course, since you know all your own ratings that really isn’t a privacy problem is it?

-- Netflix Prize FAQ

28

De-Anonymizing Netflix Records

Average subscriber has 214 dated ratingsHow many does the attacker need to know to

identify his victim’s record in the dataset?Twois enough to reduce to 8 candidate recordsFouris enough to identify uniquely (on average)Works even better with relatively rare ratings

»“The Astro-Zombies” rather than “Star Wars”Sparsity!Negligible information leakage is

sufficient for complete re-identification

Fat Tail effect helps here: most people watch obscure crap (really!)

Fat Tail effect helps here: most people watch obscure crap (really!)

29

Robustness

De-anonymization algorithm is robust to errors in attacker’s external knowledge

Dates and ratings may be known impreciselySome may even be completely wrong

Perturbation = noise in the data = doesn’t matter!

Why? Sparsity!!

Nearest neighbor is so far, can tolerate huge amount of noise, perturbation, imprecision

Where to find external knowledge?Cross-correlating with Internet Movie DB

30

What Did We Learn?

These ratings are not in IMDb user’s public profile… Does this tell you anything about this user?

(6)

31

Netflix Users with Distance < 0.15

Could edges reflect real-life relationships?

References

Related documents