• No results found

ICML, July 2008

N/A
N/A
Protected

Academic year: 2022

Share "ICML, July 2008"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

Sham M. Kakade

Shai Shalev-Shwartz Ambuj Tewari

Efficient Bandit Algorithms for Online Multiclass Prediction

ICML, July 2008

(2)

Motivation

Online web advertisement systems User submits a query

System (the learner) places an ad User either “clicks” or ignores

Goal: Maximize number of “clicks”

Modeling ?

Not the common online learning setting --

If user ignores, we donʼt get the “correct” ad Not the common multi-armed bandit --

We are also provided with a query

(3)

Outline

Online Bandit Multi-class Categorization Background: The Multi-class Perceptron The Banditron

Analysis

Experiments

The Separable Case

Extensions and Open Problems

(4)

Online Bandit Multiclass Categorization

For t = 1, 2, . . . , T Receive x ∈ R

d

Predict ˆy

t

∈ {1, . . . , k}

Pay 1[y

t

"= ˆy

t

]

y

t

is not revealed

(query) (ad)

(click feedback)

(5)

Linear Hypotheses

A hypothesis is a mapping h : R

d

→ {1, . . . , k}

Linear hypothesis: Exists k × d matrix W s.t.

h(x) = argmax

r

(W x)

r

(6)

The Multiclass Perceptron

U

t

=

 

 

 

 

 

 

 

 

 

0 . . . 0

...

0 . . . 0

. . . xt . . .

0 . . . 0

...

0 . . . 0

. . . −xt . . .

0 . . . 0

...

0 . . . 0

 

 

 

 

 

 

 

 

 

row ˆyt row yt

For t = 1, 2, . . . , T Receive x

t

∈ R

d

Predict ˆy

t

= argmax

r

(W

t

x

t

)

r

Receive y

t

Update: W

t+1

= W

t

+ U

t

where

(7)

Perceptron in the Bandit Setting

row ˆyt row yt

Ut =

0 . . . 0

...

0 . . . 0

. . . xt . . .

0 . . . 0

...

0 . . . 0

. . . −xt . . .

0 . . . 0

...

0 . . . 0

Problem: We’re blind to value of y

t

Solution: Randomization can help !

(8)

Exploration

Explore: instead of predicting ˆy

t

guess some ˜y

t

Suppose we get the feedback ’correct’, i.e. ˜y

t

= y

t

Then, we know that

ˆy

t

!= y

t

y

t

= ˜y

t

So, we can update W using the matrix U

t

(9)

Exploration vs. Exploitation

But, if our current model is correct, i.e. ˆy

t

= y

t

And, we guess some other ˜y

t

Then, we both suffer loss and do not know how to update W

In this case, it’s better to Exploit the quality of current model

We control the exploration-exploitation tradeoff

using randomization

(10)

For t = 1, 2, . . . , T Receive x

t

∈ R

d

Set ˆy

t

= argmax

r

(W

t

x

t

)

r

Define: P (r) = (1 − γ)1[r = ˆy

t

] +

γk

Randomly sample ˜y

t

according to P

Predict ˜y

t

and receive feedback 1[˜y

t

= y

t

] Update: W

t+1

= W

t

+ ˜ U

t

The Banditron

(11)

For t = 1, 2, . . . , T Receive x

t

∈ R

d

Set ˆy

t

= argmax

r

(W

t

x

t

)

r

Define: P (r) = (1 − γ)1[r = ˆy

t

] +

γk

Randomly sample ˜y

t

according to P

Predict ˜y

t

and receive feedback 1[˜y

t

= y

t

] Update: W

t+1

= W

t

+ ˜ U

t

The Banditron

0 . . . 0

...

0 . . . 0

. . . 1[yP (ytyt]

t) xt . . .

0 . . . 0

...

0 . . . 0

. . . −xt . . .

0 . . . 0

...

0 . . . 0

row ˆyt row yt

(12)

The Banditron Expected Update

E[ ˜Ut] = !

r

P (r)





















0 . . . 0

0 . . .... 0 . . . 1[yP (yt=r]

t) xt . . .

0 . . . 0

0 . . .... 0 . . . −xt . . .

0 . . . 0

0 . . .... 0





















= Ut

(13)

Analysis: The Hinge-Loss

5 !

≥ 1

(W xt)yt (W xt)r

≥ 1[y

t

"= ˆy

t

]

!

t

(W ) = max

r!=yt

1 − (W x

t

)

yt

+ (W x

t

)

r

(14)

Analysis: The Hinge-Loss

5 !

≥ 1

(W xt)yt (W xt)r

≥ 1[y

t

"= ˆy

t

]

The Separable Case:

!

t

(W ) = max

r!=yt

1 − (W x

t

)

yt

+ (W x

t

)

r

(15)

Mistake Bounds

M ≤ L + D +

L D

E[M] ≤ L + γ T + 3 max !

k D

γ , "

D γ T #

+ $

k D L γ

Perceptron:

Banditron:

Symbol Meaning

M # mistakes

L competitor loss !t !t(W !) D competitor margin !W !!2F

k # classes T # rounds

γ Exploration-Exploitation parameter

(16)

Mistake Bounds (cont.)

Symbol Meaning

L competitor loss !t !t(W !) D competitor margin !W !!2F

k # classes T # rounds

Perceptron Banditron No noise:

L = 0 D

k D T Low noise:

L = O(√

k D T )

√k D T

k D T Noisy: L + T 1/2 L + T 2/3

(17)

Experiments

Reuters RCV1

~700k documents

Bag-of-words (d ~ 350k)

4 labels {CCAT, ECAT, GCAT, MCAT}

Synthetic separable data set

9 classes, d=400, million instances

A simple simulation of generating text documents Synthetic non-separable data set

separable + 5% label noise

(18)

Experimental Results -- Reuters

102 103 104 105 106

10−2 10−1

100 γ = 0.050

number of examples

error rate

Perceptron Banditron

Similar Slope

(19)

Experimental Results -- Separable Data

102 103 104 105 106

10−5 10−4 10−3 10−2 10−1

100 γ = 0.014

number of examples

error rate

Perceptron Banditron

Slope -1

Slope -0.55

(20)

Experimental Results -- 5% label noise

102 103 104 105 106

10−0.9 10−0.7 10−0.5 10−0.3

γ = 0.006

number of examples

error rate

Perceptron Banditron

13%

10%

(21)

Exploration-Exploitation Parameter

100 −3 10−2 10−1 100

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

γ

error rate

Perceptron Banditron

100 −3 10−2 10−1 100

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

γ

error rate

Perceptron Banditron

Reuters

5% label noise

(22)

The Separable Case

Halving

Discretized hypothesis space Predict by majority vote

Remove ’wrong’ hypotheses

Note: can be applied in Bandit setting

Mistake Bound O(k2 d log(D d)) Using JL lemma we can also ob- tain O(k2 D log(T +kδ ) log(D))

(23)

The Separable Case

Halving

Discretized hypothesis space Predict by majority vote

Remove ’wrong’ hypotheses

Note: can be applied in Bandit setting

Mistake Bound O(k2 d log(D d)) Using JL lemma we can also ob- tain O(k2 D log(T +kδ ) log(D))

(24)

The Separable Case

Halving

Discretized hypothesis space Predict by majority vote

Remove ’wrong’ hypotheses

Note: can be applied in Bandit setting

Mistake Bound O(k2 d log(D d)) Using JL lemma we can also ob- tain O(k2 D log(T +kδ ) log(D))

(25)

Extensions and Open Problems

Label Ranking

Predicting a “label ranking”

How to interpret feedback ?

Multiplicative and Margin-based updates

Bandit versions of “Winnow” and “Passive-Aggressive”

Deterministic vs. Randomized strategies Achievable rates ?

Efficient algorithms for the separable case ?

References

Related documents

These test data, and those from many other researchers, highlight the benefits to be obtained in terms of reducing sulphate attack and chloride ion penetration from incorporating

Peace and other dance form and genre south spain wall art of light source in the protestant reformation ultimately broke the bullfighting culture and work of velázquez.. Empire

Looking at this from the Brazilian context we can see the manifestations of LGBT Jewish groups in Brazil, such as JGBR, Gay Jews in Brazil, Homoshabbat, Keshet Ga’avah in

The authors argue for a discipline- differentiated approach to weeding academic library collections that can employ quantitative criteria for disciplines, such as in the sciences,

å se nærmere på hva som kjennetegner de strategiske endringene som AS Vinmonopolet har gjennomført de siste årene. Slik vi ser det er offentlige virksomheter avhengig av legitimitet

We do not charge a packaging fee and we cover commercial, semi- commercial, HMO, complex buy-to-lets, bridging and refurbishment products.. This is a new area for us but it is one

Such action may include, but is not limited to: (a) screening, intercepting and investigating any instruction, communication, drawdown request, application for Services, or any

Phulner can then replace that function so that the taint is retained and a vulnerability has been injected, because user input which has not been sanitized is outputted on the page..