• No results found

Multiobjective Prediction with Expert Advice

N/A
N/A
Protected

Academic year: 2021

Share "Multiobjective Prediction with Expert Advice"

Copied!
22
0
0

Loading.... (view fulltext now)

Full text

(1)

Multiobjective Prediction with Expert Advice

Alexey Chernov

Computer Learning Research Centre and Department of Computer Science Royal Holloway University of London

(2)

Example: Prediction of Sport Match Outcome

V. Vovk, F. Zhdanov. Predictions with Expert Advice for Brier Game. ICML’08 Bookmakers data:

4 bookmakers, odds for∼10000 tennis matches (2 outcomes)

8 bookmakers, odds for∼9000 football matches (3 outcomes)

Oddsai can be transformed to probabilitiesProb[i]:

Prob[i] = P1/ai

j1/aj

The loss is measured by the square (Brier) loss function. Learner’s strategy is the Aggregating Algorithm.

(3)

Tennis Prediction, Square Loss

0 2000 4000 6000 8000 10000 12000 −4 −2 0 2 4 6 8 10 12 14 16 Theoretical bounds Learner Bookmakers

Graph of the negative regretLossEk(T)−Loss(T), 4 Experts

(4)

Tennis Prediction, Log Loss

0 2000 4000 6000 8000 10000 12000 −5 0 5 10 15 20 25 Theoretical bounds Learner Bookmakers

(5)

Tennis Predictions: “Wrong” Losses

Graphs of the negative regretLossEk(T)−Loss(T)

0 2000 4000 6000 8000 10000 12000 −10 −5 0 5 10 15 20 25 Theoretical bounds Learner Bookmakers 0 2000 4000 6000 8000 10000 12000 −5 0 5 10 15 Theoretical bounds Learner Bookmakers

log loss square loss

the AA for the square loss the AA for the log loss

(6)

Aggregating Algorithm with Wrong Losses

Fact

For the game with2outcomes, one can construct a sequence of predictions of2Experts and a sequence of outcomes with the following property. If Learner’s predictions are generated by the Aggregating Algorithm for the log lossthen for almost all T

Loss(T)≥LossE1(T) +T/10,

whereLoss(T)andLossE1(T)are thesquare lossesof Learner and Expert 1.

A similar statement holds for the Aggregating Algorithm for the square loss evaluated by the log loss.

(7)

New Settings

Experts:γt(1), . . . , γt(k) Learner:γt

Reality:ωt

Many loss functions

Loss(Em) k (T) = T X t=1 λ(m)(γt(k), ωt) Loss(m)(T) = T X t=1 λ(m)(γt, ωt)

Expert Evaluator’s advice

Loss(Ek) k (T) = T X t=1 λ(k)(γt(k), ωt) Loss(k)(T) = T X t=1 λ(k)(γt, ωt)

(8)

New Settings

Experts:γt(1), . . . , γt(k) Learner:γt

Reality:ωt

Many loss functions

Loss(Em) k (T) = T X t=1 λ(m)(γt(k), ωt) Loss(m)(T) = T X t=1 λ(m)(γt, ωt)

Expert Evaluator’s advice

Loss(Ek) k (T) = T X t=1 λ(k)(γt(k), ωt) Loss(k)(T) = T X t=1 λ(k)(γt, ωt)

(9)

Bound for New Settings

Theorem

Ifλ(k)areη(k)-mixable proper loss functions, k =1, . . . ,K , Learner has a strategy (e. g. the Defensive Forecasting algorithm) that guarantees, for all T and for all k , that

Loss(k)(T)≤Loss(Ek)

k (T) +

1 η(k)lnK.

Corollary

Ifλ(m)areη(m)-mixable proper loss functions, m=1, . . . ,M, Learner has a strategy that guarantees, for all T , for all k and for all m, that

Loss(m)(T)≤Loss(Em)

k (T) +

1

(10)

Tennis Predictions, Two Losses

Graphs of the negative regretLoss(Em)

k (T)−Loss (m)(T) 0 2000 4000 6000 8000 10000 12000 −4 −2 0 2 4 6 8 10 12 14 16 Theoretical bounds Learner Bookmakers 0 2000 4000 6000 8000 10000 12000 −5 0 5 10 15 20 25 Theoretical bounds Learner Bookmakers

square loss log loss

(11)

Defensive Forecasting Algorithm

∃π∀ω K X k=1 p(tk)1eη(λ(π,ω)−λ(π(tk),ω))≤1,

wherep(tk)1=p0(k)eη(Loss(t−1)−LossEk(t−1))

To get this (from Levin’s Lemma) we need thatλ(π, ω)is continuous

and for allπ, π0

Eπeη(λ(π,·)−λ(π

0))

= X

ω∈Ω

(12)

Defensive Forecasting Algorithm

∃π∀ω K X k=1 p(tk)1eη(k)(λ(k)(π,ω)−λ(k)(πt(k),ω))≤1,

wherep(tk)1=p0(k)eη(k)(Loss(k)(t−1)−Loss

(k)

Ek(t−1))

To get this (from Levin’s Lemma) we need thatλ(k)(π, ω)is continuous

and for allπ, π0

Eπeη

(k)(λ(k)(π,·)−λ(k)(π0))

= X

ω∈Ω

(13)

The DFA and the AA

λis continuous and∀π, π0Eπeη(λ(π,·)−λ(π 0)) ≤1 ⇒ λisη-mixable ∀π, π0Eπeη(λ(π,·)−λ(π 0)) ≤1 ⇒ λis proper λisη-mixable and ? ⇒ ∀π, π0Eπeη(λ(π,·)−λ(π 0)) ≤1

(14)

The DFA and the AA

λis continuous and∀π, π0Eπeη(λ(π,·)−λ(π 0)) ≤1 ⇒ λisη-mixable ∀π, π0Eπeη(λ(π,·)−λ(π 0)) ≤1 ⇒ λis proper

λisη-mixable and proper ⇒ ∀π, π0Eπeη(λ(π,·)−λ(π

0))

(15)

Proper Loss Functions

λis proper if for anyπ, π0 ∈ P(Ω)

Eπλ(π,·)≤Eπλ(π0,·)

Ifω∼πthenEπλ(π0, ω)is the expected loss for predictionπ0.

The expected loss is minimal for the true distribution

⇒the forecaster is encouraged to give the true probabilities

(16)

Example: Hellinger Loss

λHellinger(γ, ω) = 1 2 r X j=1 p γ(j)−qI=j}2

The Hellinger loss is√2-mixable

(17)

Proper Mixable Loss Functions

Each mixable loss functionλ(γ, ω)has a proper analogueλproper(π, ω)

such that

1 ∀πγω λproper(π, ω) =λ(γ, ω) 2 ∀πγ Eπλproper(π,·)Eπλ(γ,·)

For the Hellinger loss, the proper analogue is the spherical loss λspherical(π, ω) =1−q π(ω)

Pr

j=1(π(j))2 λspherical(π, ω) =λHellinger(γ, ω)forγ(ω) = P(rπ(ω))2

(18)

Proper Mixable Loss Functions

Each mixable loss functionλ(γ, ω)has a proper analogueλproper(π, ω)

such that

1 ∀πγω λproper(π, ω) =λ(γ, ω) 2 ∀πγ Eπλproper(π,·)Eπλ(γ,·)

For the Hellinger loss, the proper analogue is the spherical loss λspherical(π, ω) =1−q π(ω)

Pr

j=1(π(j))2 λspherical(π, ω) =λHellinger(γ, ω)forγ(ω) = P(rπ(ω))2

(19)

Example: Mixable and Non-Mixable Losses

Experts 1, . . . ,K predictπ(k)∈ P({0,1}). Experts 1, . . . ,Npredictγ(n)∈ {0,1}.

Learner predicts(π,π˜)∈ P({0,1})× P({0,1})such that ifπ(0)>1/2 thenπ˜(0) =1 and ifπ(1)>1/2 then˜π(1) =1.

There exists a strategy for Learner that guarantees for anyk

T X t=1 λsquare(πt, ωt)≤ T X t=1 λsquare(π(tk), ωt) +ln(K +N)

and for anym

T X t=1 λabs(˜πt, ωt)≤ T X t=1 λsimple(γt(n), ωt) +O( p Tln(K +N) +Tln lnT)

(20)

Tennis Predictions, Square and Absolute Losses

Graphs of the negative regretLoss(Em)

k (T)−Loss (m)(T) 0 50 100 150 200 250 300 350 400 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 theoretical bounds DF experts 0 50 100 150 200 250 300 350 400 −2 −1.5 −1 −0.5 0 0.5 DF experts

square loss absolute loss

Learner optimizes for both loss functions, using the DF algorithm with “mixability” and Hoeffding supermartingales.

(21)

The Mixed Supermartingale

1 K +N K X k=1 e2PTt=−11((pt−ωt)2−(p (k) t −ωt)2)×e2((p−ω)2−(p (k) T −ω) 2) + 1 K +N N X n=1 Z 1/e 0 dη η ln1η2 eηPTt=−11(|p˜t−ωt|−[γ(tn)6=ωt])−η2/2 ×eη(|˜p−ω|−[γT(n)6=ω])−η 2/2 wherept =πt(1),p(tk)=πt(k)(1),˜pt = ˜πt(1). [x 6=y] =1 ifx 6=y and[x 6=y] =0 ifx =y.

(22)

References

V. Vovk, F. Zhdanov. Predictions with Expert Advice for Brier Game.

ICML 2008. http://arxiv.org/abs/0710.0485

A. Chernov, Y. Kalnishkan, F. Zhdanov, V. Vovk. Supermartingales in Prediction with Expert Advice. ALT 2008 and TCS.

http://arxiv.org/abs/1003.2218

A. Chernov, V. Vovk. Prediction with expert evaluators’ advice.

ALT 2009. http://arxiv.org/abs/0902.4127

A. Chernov, V. Vovk. Prediction with Advice of Unknown Number of

Experts. UAI 2010. http://arxiv.org/abs/1006.0475

References

Related documents

The Postgraduate Diploma may be awarded if a student achieves an overall weighted average of at least 50.00%, with no mark in any taught course which counts towards

The Postgraduate Diploma may be awarded if a student achieves an overall weighted average of at least 50.00%, with no mark in any taught element which counts towards the

The Postgraduate Diploma may be awarded if a student achieves an overall weighted average of at least 50.00%, with no mark in any taught course which counts towards the

The Postgraduate Diploma may be awarded if a student achieves an overall weighted average of at least 50.00%, with no mark in any taught course which counts towards

Our objectives were to develop and test an evidence-informed process for assessing health research management and support systems (RMSS) in four African universities and for

Our proposed amendment retains “appropriate equitable relief” as the proper remedy consistent with the Supreme Court’s prior prece- dents, but makes targeted modifications to

2°4 Succinctly, because the typically equitable definition of § 502(a)(3)(B) would not permit the insurer to assert a cause of action for reimbursement under the

In the event of a fatality, major accident, case of disease or dangerous occurrence, or an incident arising having potentially major implications, the College Health &