Probabilistic Prediction
of Privacy Risks
in User Search Histories
Joanna Biega
Ida Mele
Gerhard Weikum
Or rather:
Traditional privacy protection scenario
gender age disease
user1 male 37 cancer user2 male 37 heart d. user3 female 42 cancer
Traditional privacy protection scenario
gender age disease
user1 male 37 cancer user2 male 37 heart d. user3 female 42 cancer
Is this Molly?
Traditional privacy protection scenario
gender age disease
user1 male 37 cancer user2 male 37 heart d. user3 female 42 cancer
Is this Molly?
k-anonymity
l-diversity
t-closeness
differential
privacy
Adversary
Traditional privacy protection scenario
gender age disease
user1 male 37 cancer user2 male 37 heart d. user3 female 42 cancer
Is this Molly?
k-anonymity
l-diversity
t-closeness
differential
privacy
Adversary
Traditional privacy protection scenario
gender age disease
user1 male 37 cancer user2 male 37 heart d. user3 female 42 cancer
Is this Molly?
k-anonymity
l-diversity
t-closeness
differential
privacy
Adversary
This can be orthogonal
Traditional privacy protection scenario
gender age disease
user1 male 37 cancer user2 male 37 heart d. user3 female 42 cancer
Is this Molly?
k-anonymity
l-diversity
t-closeness
differential
privacy
Adversary
This can be orthogonal
Only part of the dataset known
Probabilistic inference
The concerns of a modern user
I’m falling into depression.
I don’t want my insurance company to know before I take
The concerns of a modern user
I’m a female user in an online tech community.
Can I earn reputation easily? I’m falling into depression.
I don’t want my insurance company to know before I take
The concerns of a modern user
I’m a female user in an online tech community.
Can I earn reputation easily? I’m falling into depression.
I don’t want my insurance company to know before I take
an action!
!
I’m pregnant but don’t want my employer to know yet.
The concerns of a modern user
!
I’m rich and buying online. Do I get the same price? I’m a female user in an online
tech community.
Can I earn reputation easily? I’m falling into depression.
I don’t want my insurance company to know before I take
an action!
!
I’m pregnant but don’t want my employer to know yet.
What are the side effects of
Xanax?
anxiety
psychotherapist NY Xanax side effects
More privacy control…
gender age diseas e
user1 male 37 cancer user2 male 37 heart d. user3 female 42 cancer
What are the side effects of
Xanax?
anxiety
psychotherapist NY Xanax side effects
gender age diseas e
user1 male 37 cancer user2 male 37 heart d. user3 female 42 cancer
More privacy control on the user side!
What are theside effects of Xanax?
anxiety
psychotherapist NY Xanax side effects
gender age diseas e
user1 male 37 cancer user2 male 37 heart d. user3 female 42 cancer
Watch out!
How about:
- not posting this content? - posting anonymously? - posting obfuscating queries?
Privacy advisor
What are the side effects of
Xanax?
anxiety
psychotherapist NY Xanax side effects
anxiety
how to become psychotherapist NY
P
(
IsDepressed
=
T rue
|
hasGender
=
f emale, livesIn
=
U S
) = 0
.
01
What are the side effects of Xanax?
P
(
IsDepressed
=
T rue
|
hasGender
=
f emale, livesIn
=
U S
) = 0
.
01
What are the side effects of Xanax?
P
(
IsDepressed
=
T rue
|
hasGender
=
f emale, livesIn
=
U S
) = 0
.
01
P
(
IsDepressed
=
T rue
|
hasGender
=
f emale, livesIn
=
U S,
searchedF orSideEf f ectsOf
=
Xanax
) = 0
.
3
Quantifying privacy risks
What are the side effects of Xanax?
P
(
IsDepressed
=
T rue
|
hasGender
=
f emale, livesIn
=
U S
) = 0
.
01
P
(
IsDepressed
=
T rue
|
hasGender
=
f emale, livesIn
=
U S,
searchedF orSideEf f ectsOf
=
Xanax
) = 0
.
3
Quantifying privacy risks
What are the side effects of Xanax?
P
(
IsDepressed
=
T rue
|
hasGender
=
f emale, livesIn
=
U S
) = 0
.
01
P
(
IsDepressed
=
T rue
|
hasGender
=
f emale, livesIn
=
U S,
searchedF orSideEf f ectsOf
=
Xanax
) = 0
.
3
Quantifying privacy risks
Watch out!
Privacy
Advisor
Risk prediction in search histories
(proof-of-concept)
Sensitive states <= Sensitive queries
!
!
!
!
Risk prediction in search histories
(proof-of-concept)
Sensitive states <= Sensitive queries
!
!
!
!
Risk prediction in search histories
(proof-of-concept)
Can we predict sensitive states even though
the most obvious cues are masked out?
Alcoholism
Depression
Pregnancy
Alcoholism
:
Indicative (5)
Suggestive
Ambiguous
Depression
:
Pregnancy
:
Sensitive state vocabulary
Alcoholism
:
Indicative (5)
Suggestive
Ambiguous
Depression
:
Pregnancy
:
alcohol
dependence
antidepressant
pregnant
Sensitive state vocabulary
Alcoholism
:
Indicative (5)
Suggestive
Ambiguous
Depression
:
Pregnancy
:
alcohol
dependence
liver therapy
antidepressant
pregnant
anxiety
morning sickness
Sensitive state vocabulary
Alcoholism
:
Indicative (5)
Suggestive
Ambiguous
Depression
:
Pregnancy
:
alcohol
dependence
liver therapy
anonymous
antidepressant
pregnant
anxiety
morning sickness
stress
labor
Sensitive state vocabulary
Framework
Inference model
P(IsDepressed|…) = 0.3
User Search Log
Background Knowledge
User Search Log
!!
searchedForDepression(U)
searchedForXanax(U)
searchedForMedicalCourses(U)
+
Framework
P(IsDepressed|…) = 0.3
Background Knowledge
Inference model
+
Framework
P(IsDepressed|…) = 0.3
Inference model
Background Knowledge
!
Co-occurence counts:
!
depression(317)
side#effects(391),
psychotherapy#depression(31)
User Search Log
!!
searchedForDepression(U)
searchedForXanax(U)
Xanax
Depression
Anxiety
Occupation
Bipolar disorder
We want to jointly model a set of variables:
Or specifically:
P
(
D
|
A
=
T rue, P
=
psychiatrist
)
P
(X, D, A, O, B
)
Inference Model
Xanax
Depression
Anxiety
Occupation
Bipolar disorder
Edges encode variable dependency
Markov property assumption :
P
(O
|
X, D, A, B
) =
P
(O
|
D
)
Inference Model: Markov Random Field
Xanax
Depression
Anxiety
Occupation
Bipolar disorder
Edges encode variable dependency
Markov property assumption :
P
(O
|
X, D, A, B
) =
P
(O
|
D
)
Inference Model: Markov Random Field
Partition function
(normalizing factor)
Clique potential functions
i
(
X
1
, ..., X
n
) =
w
i
>
·
f
i
(
X
1
, ..., X
n
)
P
(
X, D, A, O, B
) =
1
i
(
X
1
, ..., X
n
) =
w
i
>
·
f
i
(
X
1
, ..., X
n
)
Xanax
Depression
Anxiety
(0
.
17)
Anxiety
^
Xanax
=
> Depression
(0
.
09)
Anxiety
^
¬
Xanax
=
> Depression
(0
.
13)
¬
Anxiety
^
Xanax
=
> Depression
First-order logic abstraction layer
MLN rules
IndicativeT erm
(
U
)
<
=
> SensitiveState
(
U
)
IndicativeT erm
(
U
)
<
=
> SensitiveState
(
U
)
T erm
1(
U
)
^
T erm
2(
U
) =
> IndicativeT erm
(
U
)
Those will be masked out
IndicativeT erm
(
U
)
<
=
> SensitiveState
(
U
)
Those will be masked out
All pairs of suggestive and ambiguous terms
T erm
1(
U
)
^
T erm
2(
U
) =
> IndicativeT erm
(
U
)
(0
.
9)
IndicativeT erm
(
U
)
<
=
> SensitiveState
(
U
)
High
#
users
(
T erm
1
, T erm
2
, IndicativeT erm
)
#
users
(
T erm
1
, T erm
2)
(0
.
13)
T erm
1(
U
)
^
T erm
2(
U
) =
> IndicativeT erm
(
U
)
+
Inference model
!
(0.13) side#effects(U) ^
depression(U) =>
xanax(U)
!
(0.07) xanax(U) ^ depressed(U)=>
IsDepressed(U)
!
…
Framework
P(IsDepressed|…) = 0.3
Background Knowledge
!
Co-occurence counts:
!
depression(317)
side#effects(391),
psychotherapy#depression(31)
User Search Log
!!
searchedForDepression(U)
searchedForXanax(U)
3x3x5
Proof-of-concept experiment
3x3x5
Sensitive state
Proof-of-concept experiment
3x3x5
Sensitive state
Proof-of-concept experiment
Choice criteria:
-
highest overlap with indicative keywords
-highest overlap with ambiguous keywords
-highest overlap with both
3x3x5
Choice criteria:
-
highest overlap with indicative keywords
-highest overlap with ambiguous keywords
-highest overlap with both
Sensitive state
Proof-of-concept experiment
indicative terms masked out for the model
ground truth by two human assessors
45 users from the AOL query log:
3x3x5
Choice criteria:
-
highest overlap with indicative keywords
-highest overlap with ambiguous keywords
-highest overlap with both
Sensitive state
Proof-of-concept experiment
indicative terms masked out for the model
ground truth by two human assessors
Precision Recall
Alcoholism 0.60 0.75
Depression 0.67 0.33
Pregnancy 0.50 0.85
Context
drinking too much (water) blackout ! (lyrics)Model limitations
User background
depression! symptoms ny uni medical! coursesContext
drinking too much (water) blackout ! (lyrics)Model limitations
Temporal dimension
depression!
symptoms
xanax!
prescription
feeling lonely feeling
anxious xanax side effects
User background
depression! symptoms ny uni medical! coursesContext
drinking too much (water) blackout ! (lyrics)Model limitations
To sum up
In the modern world of adversaries using
Big Data
and
probabilistic tools
,
we need user-centric privacy
To sum up
In the modern world of adversaries using
Big Data
and
probabilistic tools
,
we need user-centric privacy
to enable privacy control on the user side
Joanna Biega, Ida Mele, Gerhard Weikum:
Probabilistic Prediction of Privacy Risks in User Search Histories PSBD @ CIKM 2014