Adversarial Learning

(1)

Universität Potsdam

Institut für Informatik

Lehrstuhl Maschinelles Lernen

Maschinelles Lernen II

Adversarial Learning

Christoph Sawade/Niels Landwehr/Blaine Nelson Tobias Scheffer

(2)

Saw ade/Landw ehr/Sc hef fer , Mas c hi nel les Lernen II

Overview



Adversarial Learning Introduction



Classifier Evasion

 Formalizing adversarial cost & near-optimal evasion



Adversarial-Aware Classification

 Cost-senstitive classifiers & anticipating an adversary



Game Theoretic Approaches

(3)

INTRODUCTION / MOTIVATION

Part I

(4)

Introduction & Motivation



Benefits of machine learning

 Rapid adaptability to changing trends  Scalability to large, diverse data

 Statistically sound decision-making



We’d like to use learning in security domains



Domains of Interest

 Spam / virus / intrusion / fraud detection

 Ranking, recommendation systems, performance modeling, advertising

(5)

What can go Wrong?



Known adversarial learning settings



Other potential application domains

spam filtering

click / advertising fraud

(6)

Secure Machine Learning Applications



Can we prove that a learning algorithm is secure?



Can we prove that some algorithm secure?

 No! Even strong cryptographic algorithms can be broken by brute force.



Security objective for machine learning:

Show that a specific algorithm under specific

conditions can be broken only with an infeasible

(7)

Security Concerns / Approach



Concern: data non-stationary

 Adversary adapts to learner

 Learning manipulated by adversary



Security principles

 Proactive defenses

 Avoid security through obscurity / constraints

Classifier Learner Data Distribution Training Data Live Data Prediction/ Action

(8)

Security Analysis of Machine Learning



Consider a spammer who attempts to evade a

spam filter…



Threat model

: identify the adversary’s. . .

1. Motivation: successfully spam target email account 2. Mechanism: send probes to test accounts

3. Limitations: unable to observe behavior for targets’

accounts, does not want to lose intended spam content; i.e., advertisement.



Work functions

to quantify effectiveness:

 Attacker’s: effort/impact trade-off

(9)

Security Analysis of Machine Learning



Specify the learning and the attack processes, as

well as their objective functions

 Ex: spam filtering with an attacker who wants to poison the filter.



Specify attacker’s constraints:

 Attacker crafts a limited amount of training spam



Investigate the optimal attack policy.



Investigate the attacker’s gain under the optimal

policy

(10)

Attack Taxonomy



Can Machine Learning be Secure? (2006)



The Security of Machine Learning (2010)

Axis Attack Properties

Influence Causative – influences

training and test data

Exploratory –

influences test data

Security violation

Integrity – goal is false

negatives (FNs)

Availability – goal is

false positives (FPs)

Specificity Targeted – influence

prediction on particular test instance

Indiscriminate –

influence prediction on all test instances

(11)

Attack Taxonomy

Influence Axis

Influence axis determines game structure

Causative Game

:

1. Attacker poisons data

2. Defender trains on poisoned data

Exploratory Game

:

1. Defender trains on clean data

2. Attacker evades learned classifier

Classifier Learner Data Distribution Training Data Live Data

(12)

Attack Taxonomy

Security Violation Axis

Security violation determines attacker’s goal

Integrity goal

:

Malicious behavior fails to be detected

Availability goal

:

Utility of filter for user is severely degraded

Classifier Learner

Blocked

Allowed

(13)

Attack Taxonomy

Specificity Axis

Specificity axis determines locality of goal

Classifier Learner

Allowed

Blocked

Dear Sir, Sincerely school was I son S. you aware that make your to Skinner wanted absent

Targeted goal

:

Attacking specific instance

Indiscriminate goal

:

(14)

Attack Taxonomy

(15)

CLASSIFIER EVASION

Part II

(16)

Near-Optimal Evasion of a Classifier



Adversary would like to change behavior to evade

an already trained classifier



Adversary wants to minimally change his behavior



Adversarial Learning (2005)



Near-Optimal Evasion of Convex-Inducing

Classifiers (2010)

Allowed

Blocked

(17)

Adversarial Costs

Cost-based Motivation



Adversaries can alter behavior to avoid detection



Are their limits on the extent of their alteration?

 Adversary is not omnipotent about classifier/data.  Changes may incur a cost . . . their instances may

become worth less

Subj: Cheap Online Pharmacy, Order Prescription drugs online. Low Price guaranteed, fast shipping.

FDA & CPA Approved Pharmacy site FAST DELIVERY!

Viagra from $1.82 Cialis from $2.46

Viagra soft tabs from $2.25 Cialis soft tabs from $2.52 VeriSign secured payment site We ship to all countries

Ready to boost your sex life? Positive? It’s time to do it now!

Order above pills at unbelievable low price

Subj: Cheap Online Pharmacy, Order Prescription drugs online. Low Price guaranteed, fast shipping.

FDA & CPA Approved Pharmacy site FAST DELIVERY!

V1@gra from $1.82 Cialis from $2.46

V1@gra soft tabs from $2.25 Cialis soft tabs from $2.52 VeriSign secured payment site We ship to all countries

Ready to boost your sex life? Positive? It’s time to do it now!

(18)

Adversarial Costs

Modeling Adversaries



Modeling adversarial costs is difficult:

 Value for the adversary is not easily quantified

 Mapping value onto individual changes is challenging



Euclidean (

l

_p

) distances are convenient for math:

𝐶

_𝑝

𝐱 =

𝑚

𝑐

_𝑓

𝑥

_𝑓

− 𝑥

_𝑓∗ 𝑝 𝑓=1

1/𝑝

 𝐱∗ is the adversary’s ideal instance (desired spam)  𝑝 > 0 adjusts cost’s shape (𝑝 = 1 is change from 𝐱∗)  𝑐_𝑓 is a per-feature weight



Ideally, more realistic costs (e.g., edit distances)

(19)

Near-Optimal Evasion Problem



Adversarial Cost:

 Attacker has a desired instance

 Attacker incurs a cost for changing each feature



Near-Optimal Evasion:

 Attacker adapts based on information from queries  Attacker wants to find least-cost negative instance

with fewest queries

 Evasion is considered hard if it requires many queries in terms of the feature space size

 Lowd & Meek (2005) demonstrated an efficient

algorithm for near-optimal evasion of linear classifiers  Here we further evade convex-inducing classifiers

(20)

Near-Optimal Evasion Problem

Problem Formulation



𝑚

-dimensional space known by adversary



Initial negative point

𝐱

−

& positive target

𝐱

∗ 

Adversary cost is a weighted

l

₁

cost from

𝐱

∗ 

Desired accuracy

𝜀

 binary search within (1 + 𝜀) factor of opt. in 𝐿_𝜀 steps



Find near-optimum with polynomial queries

𝐱− 0

l

₁

cost

(

1 + 𝜀

)

𝐿

_𝜀

steps

classifier

boundary

negative class

positive class

(21)

Near-Optimal Evasion

Linear Classifiers – Lowd & Meek (2005)



Linear classifier:

𝑓 𝐱 = 𝐰

T

𝐱 − 𝑏



Idea: flip each feature of

𝐱

∗

to its

value in

𝐱

−

until we find a negative

 We have a sign witness: points

𝐱1 ∈ 𝑋− & 𝐱2 ∈ 𝑋+ differ only at 𝑓.  𝐱2 is within 𝑥_𝑓1 − 𝑥_𝑓2 from boundary  Search along each direction to find

where 𝐱1 becomes positive within 𝜀₄  Gives estimate of 𝐰 within (1 + 𝜀)  The best feature, 𝑓, maximizes 𝑤𝑓

𝑐𝑓



Algorithm’s complexity:

𝑂 𝑚 ∙ 𝐿

_𝜀

𝐱∗

(22)

Near-Optimal Evasion

Convex-Inducing Classifiers



Convex-Inducing Classifier

: separates space into

2 sets; 1 is convex

negative class

positive class

negative class

(23)

Properties of

l

1

Costs



l

₁

cost-ball is hyper-octahedron

 Balls are convex

 Formed by hull of 2 ∙ 𝑚 axis-oriented vertices



All queries positive  lower bound



Any query negative  upper bound

𝐯

₁

𝐯

₂

𝐯

₃

𝐯

₄ 𝐯₃ 𝐯₂ 𝐯₄ 𝐯₁ 0

l

1

cost

(1 + 𝜀)

upper

lower

(24)

Convex Positive Class: Line Search



Convex Positive Set

 Simultaneous line search on each axis  Algorithm is also 𝑂 𝑚 ∙ 𝐿_𝜀 queries

𝐯

₁

𝐯

₂

𝐯

₃

𝐯

₄

0

l

1

cost

(25)

Convex Positive Class: Pruning



Pruning:

use convexity

 prune directions that violate upper bound  worst-case still 𝑂 𝑚 ∙ 𝐿_𝜀 queries

(26)

Convex Positive Class: Better Algorithm



𝐾

-step Line Search

 Motivation: breadth- or depth-first both 𝑂 𝑚 ∙ 𝐿_𝜀  Tradeoff between bound progress & pruning

 Take 𝐾 steps in one direction & query remaining



For

𝐾 = 𝐿

_𝜀

:

𝑂 𝐿

_𝜀

+ 𝑚 ∙ 𝐿

_𝜀

queries



Lower Bound:

max 𝐿

_𝜀

, 𝑚

queries

(27)

Near-Optimal Evasion

Convex Negative Class



Negative class &

l

₁

-ball intersection convex



Objective: Determine if intersection is empty

positive class

negative class

(28)

Convex Negative Class



Probabilistic ellipsoid algorithm for query-based

optimization [Bertsimas & Vempala (2004)]

 Hit-and-run sampling for any convex body

(29)

EVASION-RESISTANT CLASSIFIERS

Part III

(30)

Evasion-Resistant Classifiers

Anticipating Evasion — Dalvi et al. (2004)



We want to design classifiers to be robust against

evasion



Lets suppose classifier has a utility (loss) function

𝐿

₋₁

𝑦, 𝑦

& adversary has utility function

𝐿

₊₁

𝑦, 𝑦



Classifier

𝑓

trained on clean data (Exploratory)



After learning, the adversary attempts to evade

𝑓

by

transforming data with adversarial transform

𝐴: 𝐱 → 𝐱′

(31)

Cost-Sensitive Naïve Bayes



Naïve Bayes log-odds estimate is

log

P +|𝐱

P −|𝐱

= log

P +

P −

+ log

P 𝑥

_𝑓

| +

P 𝑥

_𝑓

| −

𝑓

P + , P − , P 𝑥_𝑓| + , P 𝑥_𝑓| − estimated from data



Cost-sensitive predictor maximizes expected utility;

ie,

𝑓 𝐱 = +

if

P +|𝐱

P −|𝐱

>

𝐿

₋₁

−, − − 𝐿

₋₁

+, −

𝐿

₋₁

+, + − 𝐿

₋₁

−, +



Adversary can only manipulate feature probabilities

(32)

Adversarial Strategy



Adversary wants a minimum cost camouflage

(MCC); ie, evasion point with smallest cost



Adversary incurs cost

𝐶 𝐱, 𝐱′

for changing

𝐱

to

𝐱′



The function

𝑀𝐶𝐶 𝐱

is a search for best evasion

instances given cost function

𝐶

; eg, a search over

feature changes with memoization



Adversarial behavior:

 If 𝑓 𝐱 = + & 𝐶 𝑀𝐶𝐶 𝐱 , 𝐱 < 𝐿₊₁ −, + − 𝐿₊₁ +, +

𝐴 𝐱 = 𝑀𝐶𝐶 𝐱

 otherwise, evasion is not worthwhile  𝐴 𝐱 = 𝐱

(33)

Adversary-Aware Naïve Bayes



To counter, we adjust

P 𝑥

_𝑓

| +

 We need to account for all P 𝐱′| + for which

𝐴 𝐱′ = 𝐱; ie, all data that should be transformed by a rational adversary to 𝐱

 We anticipate whether adversary should change 𝐱



This gives the following new probability calculation:

P

_𝐴

𝐱| + = P 𝐱′| +

𝐱′∈𝑋_𝐴 𝐱

+ 𝐼 𝐱 P 𝐱| +

 𝑋_𝐴 𝐱 = 𝐱′ ≠ 𝐱|𝐴 𝐱′ = 𝐱 : points to be transformed  𝐼 𝐱 = 0 if 𝑓 𝐱 = + & 𝐶 𝑀𝐶𝐶 𝐱 , 𝐱 < 𝐿₊₁ −, + − 𝐿₊₁ +, + ; otherwise it is 1

(34)

Adversary-Aware Naïve Bayes

Empirical Results



Tests on 3 types of alterations:

 Adding Words (AW) - added words incur unit cost  Add Length (AL) - cost is proportional to word length  Synonym (SYN) – cost proportional to its occurrence;

synonyms given unit cost.



Models:

 naïve Bayes (NB)  adversarial-aware (AC) 34 𝐿₋₁ +, − Classifier 10 FN / FP 100 FN / FP 1000 FN / FP NB-clean 94 / 2 124 / 1 165 / 1 NB-AW 481 / 2 481 / 1 481 / 1 AC-AW 93 / 0 123 / 0 164 / 0 NB-AL 477 / 2 477 / 1 477 / 1 AC-AL 94 / 0 124 / 0 165 / 0 NB-SYN 408 / 2 413 / 1 414 / 1 AC-SYN 164 / 1 196 / 0 229 / 0

(35)

Drawbacks of Adversary-Aware Approach



Problem 1: We assumed adversary doesn’t know

we change model



Problem 2: We assumed adversary plays rationally

 An instance 𝐱 that would be correctly classified by naïve Bayes, now has P_𝐴 𝐱| + = 0 & will probably be misclassified

(36)

GAME-THEORETIC APPROACH

Part IV

(37)

Game Theory Introduction

The Prisoner’s Dilemma



Two prisoners are suspected of murder.

 If neither confesses, each serves 5 years in prison.  If both confess, each serves 10 years.

 If one confesses while the other does not, the 1st goes free while the other gets 25 years.



Non-cooperative players

Prisoner B Stay silent Confess Prisoner A Stay silent 5 Years 5 Years 25 Years 0 Years Confess 0 Years 25 Years 10 Years 10 Years

Players

Actions

Loss

(38)

Game Theory Introduction

The Prisoner’s Dilemma



Minimax Strategy

 Setting:

 Prisoner A does not know the loss of B or

 Prisoner B wants A to receive the greatest loss

 Prisoner A should choose a safe strategy

 “Confessing” yields 10 years in the worst-case whereas “staying silent” has 25 years

Prisoner B Stay silent Confess Prisoner A Stay silent 5 Years ?? Years 25 Years ?? Years Confess 0 Years ?? Years 10 Years ?? Years

(39)

Game Theory Introduction

The Prisoner’s Dilemma



Equilibrium Strategies

 What joint actions are “optimal” for both player?

 Nash equilibrium: joint actions where neither player can benefit by unilaterally changing their action

 Question of existence/uniqueness of Nash equilibria

 Equilibrium of PD: Both prisoner’s should confess

Prisoner B Stay silent Confess Prisoner A Stay silent 5 Years 5 Years 25 Years 0 Years Confess 0 Years 25 Years 10 Years 10 Years

(40)

Adversarial Games in Machine Learning



Player 1 (Learner):

 Model is linear: 𝑓_𝐰 𝐱 = sgn 𝐰T𝐱

 Learner chooses model 𝐰 to minimize loss 𝜃₋₁ 𝐰, 𝐷 = 𝑐_−1,𝑖𝐿₋₁ 𝑓_𝐰 𝐱_𝑖 , 𝑦_𝑖

𝐱_𝑖,𝑦_𝑖∈𝐷

+ Ω₋₁ 𝐰 with per-instance costs 𝑐_−1,𝑖



Player 2 (Attacker):

 Adversarial transform 𝐴 changes test data: 𝐷 → 𝐷  The transform is limited by regularizer Ω₊₁

𝜃₊₁ 𝐰, 𝐷 = 𝑐_+1,𝑖𝐿₊₁ 𝑓_𝐰 𝐱_𝑖 , 𝑦_𝑖

𝐱_𝑖,𝑦_𝑖∈𝐷

+ Ω₊₁ 𝐷, 𝐷

(41)

Strategies for Adversarial Learning



If

𝜃

₋₁

&

𝜃

₊₁

are antagonistic, minimax is optimal



Often, this is not the case:

 Spammer does not seek to misclassify usual messages, only to have his spam read



A Nash equilibrium exists if (roughly):

 𝐿₋₁ & 𝐿₊₁ are convex & twice continuously differentiable

 Regularizers Ω₋₁ & Ω₊₁ are uniformly strongly convex & twice continuously differentiable



The equilibrium is unique under some conditions

 See, “Static Prediction Games for Adversarial Learning Problems” (2012)

(42)

Applications to Classification



Logistic Regression: Nash equilibrium exists under

mild conditions on regularizations constants.



SVM:

 For hinge loss, Nash equilibrium not provable since loss is non-differentiable

 For trigonomic loss (for 𝑠 > 0), 𝐿₋₁ 𝑧, 𝑦 = −𝑦𝑧 𝑖𝑓 𝑦𝑧 < −𝑠 𝑠 − 𝑦𝑧 2 − 𝑠 𝜋 cos 𝜋𝑦𝑧 2𝑠 𝑖𝑓 𝑦𝑧 ≤ 𝑠 0 𝑖𝑓 𝑦𝑧 > 𝑠

& squared regularizers the equilibrium exists & is unique if sufficiently regularized.

(43)

Finding Nash Equilibrium Solutions



If a single equilibrium exists…

 Learner seeks to minimize 𝜃₋₁ w.r.t. 𝐰  Attacker seeks to minimize 𝜃₊₁ w.r.t. 𝐱_𝑖



It can be shown that

𝐰

∗

, 𝐱

∗

is a solution if

𝐠

_𝐫

𝐰

∗

, 𝐱

∗ T

𝐰

𝐱 −

𝐰

∗

𝐱

∗

≥ 0 ∀ 𝐰, 𝐱

where

𝐠

_𝐫

is a pseudo-gradient for

𝜃

₋₁

&

𝜃

₊₁



Using an extragradient descent method based on

(44)

(45)

Summary



We looked at

exploratory

adversarial learning

problems

 We overviewed a taxonomy of adversary problems  We investigated near-optimal evasion

 We saw how adversary-aware classifiers can counter an evasive adversary

 We finally investigated a game-theoretic approach, which arrived at a Nash-equilibrium-based solution



This talk did not cover

causative

attacks in which

(46)

Exploiting Machine Learning to

Subvert Your Spam Filter



Exploiting Machine Learning to Subvert Your Spam

Filter (2008)



Misleading learners: Co-opting your spam filter.

(2009)

Blocked

Allowed

(47)

Saw ade/Landw ehr/Sc hef fer , Mas c hi nel les Lernen II Attacker Email Distribution Filter Contamination Attacker’s Information INBOX

Poisoning the Training Set

Learner Spam Ham Attack Corpus Spam Folder

(48)



SpamBayes statistical spam filter

 Unigram word frequencies

 Token scores are independent spam test

 Build message score from token scores

 Threshold: ham, unsure, or spam

Ham

Unsure

Spam

0



₀



₁

1

SpamBayes

Viagra Rolex funny uni-potsdam.de

token

score

0 1

(49)



Training on attack msg. changes scores

 Design attacks to increase scores of ham



Message score increases with token scores

Ham

Unsure

Spam

0



₀



₁

1

Outline of Attacks

We went to _the grocery store went to grocery store the We 0 1 0 1 0 1 0 1 0 1 0 1

(50)

Dictionary Attack



Make spam filter unusable

 misclassify ham as spam

(51)

Dictionary Attack



Initial Inbox: 10K

messages



Attacks

 Black: Optimal  Red: English dictionary  Blue: 90K most common words in Usenet

(52)

Saw ade/Landw ehr/Sc hef fer , Mas c hi nel les Lernen II Dear Sir,

I wanted to make you aware that your son was absent from school today.

Sincerely, S. Skinner Rolex Breitling Cartier Porsche Dior Gucci Cheap quality watches now!!!

absent aware dear from I make sincerely sir school Skinner son that to today wanted was you your

Dear Sir, I wanted to make you aware that your son was absent from school today. Sincerely, S. Skinner Dear Sir,

Sincerely school was I son S. you aware that make your to Skinner wanted absent from today. Dear Sir, Sincerely from aware school I absent son S. you today that make your to Skinner was wanted. Dear Sir, Sincerely Skinner aware make school I wanted absent son S. you today that from your to was. Dear Sir, Sincerely absent Skinner your aware was make from school I today wanted son S. you that to.

Dear Sir, Sincerely school was I son S. you aware that make your to Skinner wanted absent from today. Dear Sir, Sincerely from aware school I absent son S. you today that make your to Skinner was wanted. Dear Sir, Sincerely Skinner aware make school I wanted absent son S. you today that from your to was. Dear Sir, Sincerely absent Skinner your aware was make from school I today wanted son S. you that to.

Focused Attack

(53)

Focused Attack



Initial Inbox: 5K

messages



200 targeted attacks



50% guessing rate

(54)

Focused Attack



Experiment varies the

percent of tokens

correctly guessed



The percentage of

tokens correctly

guessed impacts the

effect of attack

Probability of guessing target tokens

P erc en tag e of atta c k s uc c es s

(55)

Defenses

Reject on Negative Impact (RONI)



Method

 Assess impact of query message on training  Exclude messages with large negative impact

SpamBayes Filter SpamBayes Learner INBOX Spam Folder

(56)

Defenses

Empirical RONI Results



Perfectly identifies all dictionary attacks



Unable to differentiate focused attacks



Defense impact on filter:

Predicted Label

Truth

ham spam unsure

ham 97.5% 0.0% 2.5%

spam 2.6% 80% 18%

Predicted Label

Truth

ham spam unsure

ham 95% 0.3% 4.6%

spam 2.0% 87% 11%

Performance on Normal Mail

Before RONI

Performance on Normal Mail

After RONI

(57)

References

Marco Barreno, Blaine Nelson, Russell Sears, Anthony D. Joseph, and J. D. Tygar. Can Machine Learning be Secure? In Proceedings of the ACM Symposium on InformAtion, Computer, and Communications Security (ASIACCS), March 2006.

Dimitris Bertsimas and Santosh Vempala. Solving Convex Programs by Random Walks. Journal of the ACM, 51(4):540–556, 2004.

Michael Brückner, Christian Kanzow, and Tobias Scheffer. Static Prediction Games for Adversarial Learning Problems. Journal of Machine Learning Research 13:2617-2654, 2012

Nilesh Dalvi, Pedro Domingos, Mausam, Sumit Sanghai, and Deepak Verma. Adversarial Classification. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 99-108, Seattle, WA, 2004. ACM Press.

Daniel Lowd and Christopher Meek. Adversarial Learning. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 641-647, 2005.

Blaine Nelson, Marco Barreno, Fuching Jack Chi, Anthony D. Joseph, Benjamin I. P. Rubinstein, Udam Saini, Charles Sutton, J. D. Tygar, and Kai Xia. Misleading Learners: Co-opting your Spam Filter. Book chapter in Jeffrey J. P. Tsai and Philip S. Yu (eds.) Machine Learning in Cyber Trust: Security, Privacy, and Reliability, pg. 17-51, 2009.

Blaine Nelson, Marco Barreno, Fuching Jack Chi, Anthony D. Joseph, Benjamin I.P. Rubinstein, Udam Saini, Charles Sutton, J. D. Tygar, and Kai Xia. Exploiting Machine Learning to Subvert Your Spam Filter. In the

Proceedings of the First USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET'08), 2008. Blaine Nelson, Benjamin I. P. Rubinstein, Ling Huang, Anthony D. Joseph, Shing-hon Lau, Steven Lee, Satish Rao,

Anthony Tran and J. D. Tygar. Near-Optimal Evasion of Convex-Inducing Classifiers. To appear in the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2010.