• No results found

Probabilistic Prediction of Privacy Risks

N/A
N/A
Protected

Academic year: 2021

Share "Probabilistic Prediction of Privacy Risks"

Copied!
51
0
0

Loading.... (view fulltext now)

Full text

(1)

Probabilistic Prediction

of Privacy Risks

in User Search Histories

Joanna Biega

Ida Mele

Gerhard Weikum

(2)

Or rather:

(3)

Traditional privacy protection scenario

gender age disease

user1 male 37 cancer user2 male 37 heart d. user3 female 42 cancer

(4)

Traditional privacy protection scenario

gender age disease

user1 male 37 cancer user2 male 37 heart d. user3 female 42 cancer

Is this Molly?

(5)

Traditional privacy protection scenario

gender age disease

user1 male 37 cancer user2 male 37 heart d. user3 female 42 cancer

Is this Molly?

k-anonymity

l-diversity

t-closeness

differential

privacy

Adversary

(6)

Traditional privacy protection scenario

gender age disease

user1 male 37 cancer user2 male 37 heart d. user3 female 42 cancer

Is this Molly?

k-anonymity

l-diversity

t-closeness

differential

privacy

Adversary

(7)

Traditional privacy protection scenario

gender age disease

user1 male 37 cancer user2 male 37 heart d. user3 female 42 cancer

Is this Molly?

k-anonymity

l-diversity

t-closeness

differential

privacy

Adversary

This can be orthogonal

(8)

Traditional privacy protection scenario

gender age disease

user1 male 37 cancer user2 male 37 heart d. user3 female 42 cancer

Is this Molly?

k-anonymity

l-diversity

t-closeness

differential

privacy

Adversary

This can be orthogonal

Only part of the dataset known

Probabilistic inference

(9)
(10)

The concerns of a modern user

I’m falling into depression.

I don’t want my insurance company to know before I take

(11)

The concerns of a modern user

I’m a female user in an online tech community. 


Can I earn reputation easily? I’m falling into depression.

I don’t want my insurance company to know before I take

(12)

The concerns of a modern user

I’m a female user in an online tech community. 


Can I earn reputation easily? I’m falling into depression.

I don’t want my insurance company to know before I take

an action!

!

I’m pregnant but don’t want my employer to know yet.

(13)

The concerns of a modern user

!

I’m rich and buying online. Do I get the same price? I’m a female user in an online

tech community. 


Can I earn reputation easily? I’m falling into depression.

I don’t want my insurance company to know before I take

an action!

!

I’m pregnant but don’t want my employer to know yet.

(14)

What are the side effects of

Xanax?

anxiety

psychotherapist NY Xanax side effects

More privacy control…

gender age diseas e

user1 male 37 cancer user2 male 37 heart d. user3 female 42 cancer

(15)

What are the side effects of

Xanax?

anxiety

psychotherapist NY Xanax side effects

gender age diseas e

user1 male 37 cancer user2 male 37 heart d. user3 female 42 cancer

(16)

More privacy control on the user side!

What are the

side effects of Xanax?

anxiety

psychotherapist NY Xanax side effects

gender age diseas e

user1 male 37 cancer user2 male 37 heart d. user3 female 42 cancer

Watch out!

How about:

- not posting this content? - posting anonymously? - posting obfuscating queries?

Privacy advisor

What are the side effects of

Xanax?

anxiety

psychotherapist NY Xanax side effects

anxiety

how to become psychotherapist NY

(17)

P

(

IsDepressed

=

T rue

|

hasGender

=

f emale, livesIn

=

U S

) = 0

.

01

(18)

What are the side effects of Xanax?

P

(

IsDepressed

=

T rue

|

hasGender

=

f emale, livesIn

=

U S

) = 0

.

01

(19)

What are the side effects of Xanax?

P

(

IsDepressed

=

T rue

|

hasGender

=

f emale, livesIn

=

U S

) = 0

.

01

P

(

IsDepressed

=

T rue

|

hasGender

=

f emale, livesIn

=

U S,

searchedF orSideEf f ectsOf

=

Xanax

) = 0

.

3

Quantifying privacy risks

(20)

What are the side effects of Xanax?

P

(

IsDepressed

=

T rue

|

hasGender

=

f emale, livesIn

=

U S

) = 0

.

01

P

(

IsDepressed

=

T rue

|

hasGender

=

f emale, livesIn

=

U S,

searchedF orSideEf f ectsOf

=

Xanax

) = 0

.

3

Quantifying privacy risks

(21)

What are the side effects of Xanax?

P

(

IsDepressed

=

T rue

|

hasGender

=

f emale, livesIn

=

U S

) = 0

.

01

P

(

IsDepressed

=

T rue

|

hasGender

=

f emale, livesIn

=

U S,

searchedF orSideEf f ectsOf

=

Xanax

) = 0

.

3

Quantifying privacy risks

Watch out!

Privacy

Advisor

(22)

Risk prediction in search histories

(proof-of-concept)

(23)

Sensitive states <= Sensitive queries

!

!

!

!

Risk prediction in search histories

(proof-of-concept)

(24)

Sensitive states <= Sensitive queries

!

!

!

!

Risk prediction in search histories

(proof-of-concept)

Can we predict sensitive states even though

the most obvious cues are masked out?

(25)

Alcoholism

Depression

Pregnancy

(26)

Alcoholism

:

Indicative (5)

Suggestive

Ambiguous

Depression

:

Pregnancy

:

Sensitive state vocabulary

(27)

Alcoholism

:

Indicative (5)

Suggestive

Ambiguous

Depression

:

Pregnancy

:

alcohol

dependence

antidepressant

pregnant

Sensitive state vocabulary

(28)

Alcoholism

:

Indicative (5)

Suggestive

Ambiguous

Depression

:

Pregnancy

:

alcohol

dependence

liver therapy

antidepressant

pregnant

anxiety

morning sickness

Sensitive state vocabulary

(29)

Alcoholism

:

Indicative (5)

Suggestive

Ambiguous

Depression

:

Pregnancy

:

alcohol

dependence

liver therapy

anonymous

antidepressant

pregnant

anxiety

morning sickness

stress

labor

Sensitive state vocabulary

(30)

Framework

Inference model

P(IsDepressed|…) = 0.3

User Search Log

Background Knowledge

(31)

User Search Log

!!

searchedForDepression(U)

searchedForXanax(U)

searchedForMedicalCourses(U)

+

Framework

P(IsDepressed|…) = 0.3

Background Knowledge

Inference model

(32)

+

Framework

P(IsDepressed|…) = 0.3

Inference model

Background Knowledge

!

Co-occurence counts:

!

depression(317)

side#effects(391),

psychotherapy#depression(31)

User Search Log

!!

searchedForDepression(U)

searchedForXanax(U)

(33)

Xanax

Depression

Anxiety

Occupation

Bipolar disorder

We want to jointly model a set of variables:

Or specifically:

P

(

D

|

A

=

T rue, P

=

psychiatrist

)

P

(X, D, A, O, B

)

Inference Model

(34)

Xanax

Depression

Anxiety

Occupation

Bipolar disorder

Edges encode variable dependency

Markov property assumption :

P

(O

|

X, D, A, B

) =

P

(O

|

D

)

Inference Model: Markov Random Field

(35)

Xanax

Depression

Anxiety

Occupation

Bipolar disorder

Edges encode variable dependency

Markov property assumption :

P

(O

|

X, D, A, B

) =

P

(O

|

D

)

Inference Model: Markov Random Field

Partition function

(normalizing factor)

Clique potential functions

i

(

X

1

, ..., X

n

) =

w

i

>

·

f

i

(

X

1

, ..., X

n

)

P

(

X, D, A, O, B

) =

1

(36)

i

(

X

1

, ..., X

n

) =

w

i

>

·

f

i

(

X

1

, ..., X

n

)

Xanax

Depression

Anxiety

(0

.

17)

Anxiety

^

Xanax

=

> Depression

(0

.

09)

Anxiety

^

¬

Xanax

=

> Depression

(0

.

13)

¬

Anxiety

^

Xanax

=

> Depression

First-order logic abstraction layer

(37)

MLN rules

IndicativeT erm

(

U

)

<

=

> SensitiveState

(

U

)

(38)

IndicativeT erm

(

U

)

<

=

> SensitiveState

(

U

)

T erm

1(

U

)

^

T erm

2(

U

) =

> IndicativeT erm

(

U

)

Those will be masked out

(39)

IndicativeT erm

(

U

)

<

=

> SensitiveState

(

U

)

Those will be masked out

All pairs of suggestive and ambiguous terms

T erm

1(

U

)

^

T erm

2(

U

) =

> IndicativeT erm

(

U

)

(40)

(0

.

9)

IndicativeT erm

(

U

)

<

=

> SensitiveState

(

U

)

High

#

users

(

T erm

1

, T erm

2

, IndicativeT erm

)

#

users

(

T erm

1

, T erm

2)

(0

.

13)

T erm

1(

U

)

^

T erm

2(

U

) =

> IndicativeT erm

(

U

)

(41)

+

Inference model

!

(0.13) side#effects(U) ^

depression(U) =>

xanax(U)

!

(0.07) xanax(U) ^ depressed(U)=>

IsDepressed(U)

!

Framework

P(IsDepressed|…) = 0.3

Background Knowledge

!

Co-occurence counts:

!

depression(317)

side#effects(391),

psychotherapy#depression(31)

User Search Log

!!

searchedForDepression(U)

searchedForXanax(U)

(42)

3x3x5

Proof-of-concept experiment

(43)

3x3x5

Sensitive state

Proof-of-concept experiment

(44)

3x3x5

Sensitive state

Proof-of-concept experiment

Choice criteria:

-

highest overlap with indicative keywords

-

highest overlap with ambiguous keywords

-

highest overlap with both

(45)

3x3x5

Choice criteria:

-

highest overlap with indicative keywords

-

highest overlap with ambiguous keywords

-

highest overlap with both

Sensitive state

Proof-of-concept experiment

indicative terms masked out for the model

ground truth by two human assessors

45 users from the AOL query log:

(46)

3x3x5

Choice criteria:

-

highest overlap with indicative keywords

-

highest overlap with ambiguous keywords

-

highest overlap with both

Sensitive state

Proof-of-concept experiment

indicative terms masked out for the model

ground truth by two human assessors

Precision Recall

Alcoholism 0.60 0.75

Depression 0.67 0.33

Pregnancy 0.50 0.85

(47)

Context

drinking too much (water) blackout ! (lyrics)

Model limitations

(48)

User background

depression! symptoms ny uni medical! courses

Context

drinking too much (water) blackout ! (lyrics)

Model limitations

(49)

Temporal dimension

depression!

symptoms

xanax!

prescription

feeling lonely feeling

anxious xanax side effects

User background

depression! symptoms ny uni medical! courses

Context

drinking too much (water) blackout ! (lyrics)

Model limitations

(50)

To sum up

In the modern world of adversaries using

Big Data

and

probabilistic tools

,

we need user-centric privacy

(51)

To sum up

In the modern world of adversaries using

Big Data

and

probabilistic tools

,

we need user-centric privacy

to enable privacy control on the user side

Joanna Biega, Ida Mele, Gerhard Weikum:

Probabilistic Prediction of Privacy Risks in User Search Histories PSBD @ CIKM 2014

References

Related documents