Multiobjective Prediction with Expert Advice
Alexey Chernov
Computer Learning Research Centre and Department of Computer Science Royal Holloway University of London
Example: Prediction of Sport Match Outcome
V. Vovk, F. Zhdanov. Predictions with Expert Advice for Brier Game. ICML’08 Bookmakers data:
4 bookmakers, odds for∼10000 tennis matches (2 outcomes)
8 bookmakers, odds for∼9000 football matches (3 outcomes)
Oddsai can be transformed to probabilitiesProb[i]:
Prob[i] = P1/ai
j1/aj
The loss is measured by the square (Brier) loss function. Learner’s strategy is the Aggregating Algorithm.
Tennis Prediction, Square Loss
0 2000 4000 6000 8000 10000 12000 −4 −2 0 2 4 6 8 10 12 14 16 Theoretical bounds Learner BookmakersGraph of the negative regretLossEk(T)−Loss(T), 4 Experts
Tennis Prediction, Log Loss
0 2000 4000 6000 8000 10000 12000 −5 0 5 10 15 20 25 Theoretical bounds Learner BookmakersTennis Predictions: “Wrong” Losses
Graphs of the negative regretLossEk(T)−Loss(T)
0 2000 4000 6000 8000 10000 12000 −10 −5 0 5 10 15 20 25 Theoretical bounds Learner Bookmakers 0 2000 4000 6000 8000 10000 12000 −5 0 5 10 15 Theoretical bounds Learner Bookmakers
log loss square loss
the AA for the square loss the AA for the log loss
Aggregating Algorithm with Wrong Losses
Fact
For the game with2outcomes, one can construct a sequence of predictions of2Experts and a sequence of outcomes with the following property. If Learner’s predictions are generated by the Aggregating Algorithm for the log lossthen for almost all T
Loss(T)≥LossE1(T) +T/10,
whereLoss(T)andLossE1(T)are thesquare lossesof Learner and Expert 1.
A similar statement holds for the Aggregating Algorithm for the square loss evaluated by the log loss.
New Settings
Experts:γt(1), . . . , γt(k) Learner:γt
Reality:ωt
Many loss functions
Loss(Em) k (T) = T X t=1 λ(m)(γt(k), ωt) Loss(m)(T) = T X t=1 λ(m)(γt, ωt)
Expert Evaluator’s advice
Loss(Ek) k (T) = T X t=1 λ(k)(γt(k), ωt) Loss(k)(T) = T X t=1 λ(k)(γt, ωt)
New Settings
Experts:γt(1), . . . , γt(k) Learner:γt
Reality:ωt
Many loss functions
Loss(Em) k (T) = T X t=1 λ(m)(γt(k), ωt) Loss(m)(T) = T X t=1 λ(m)(γt, ωt)
Expert Evaluator’s advice
Loss(Ek) k (T) = T X t=1 λ(k)(γt(k), ωt) Loss(k)(T) = T X t=1 λ(k)(γt, ωt)
Bound for New Settings
Theorem
Ifλ(k)areη(k)-mixable proper loss functions, k =1, . . . ,K , Learner has a strategy (e. g. the Defensive Forecasting algorithm) that guarantees, for all T and for all k , that
Loss(k)(T)≤Loss(Ek)
k (T) +
1 η(k)lnK.
Corollary
Ifλ(m)areη(m)-mixable proper loss functions, m=1, . . . ,M, Learner has a strategy that guarantees, for all T , for all k and for all m, that
Loss(m)(T)≤Loss(Em)
k (T) +
1
Tennis Predictions, Two Losses
Graphs of the negative regretLoss(Em)
k (T)−Loss (m)(T) 0 2000 4000 6000 8000 10000 12000 −4 −2 0 2 4 6 8 10 12 14 16 Theoretical bounds Learner Bookmakers 0 2000 4000 6000 8000 10000 12000 −5 0 5 10 15 20 25 Theoretical bounds Learner Bookmakers
square loss log loss
Defensive Forecasting Algorithm
∃π∀ω K X k=1 p(t−k)1eη(λ(π,ω)−λ(π(tk),ω))≤1,wherep(t−k)1=p0(k)eη(Loss(t−1)−LossEk(t−1))
To get this (from Levin’s Lemma) we need thatλ(π, ω)is continuous
and for allπ, π0
Eπeη(λ(π,·)−λ(π
0,·))
= X
ω∈Ω
Defensive Forecasting Algorithm
∃π∀ω K X k=1 p(t−k)1eη(k)(λ(k)(π,ω)−λ(k)(πt(k),ω))≤1,wherep(t−k)1=p0(k)eη(k)(Loss(k)(t−1)−Loss
(k)
Ek(t−1))
To get this (from Levin’s Lemma) we need thatλ(k)(π, ω)is continuous
and for allπ, π0
Eπeη
(k)(λ(k)(π,·)−λ(k)(π0,·))
= X
ω∈Ω
The DFA and the AA
λis continuous and∀π, π0Eπeη(λ(π,·)−λ(π 0,·)) ≤1 ⇒ λisη-mixable ∀π, π0Eπeη(λ(π,·)−λ(π 0,·)) ≤1 ⇒ λis proper λisη-mixable and ? ⇒ ∀π, π0Eπeη(λ(π,·)−λ(π 0,·)) ≤1The DFA and the AA
λis continuous and∀π, π0Eπeη(λ(π,·)−λ(π 0,·)) ≤1 ⇒ λisη-mixable ∀π, π0Eπeη(λ(π,·)−λ(π 0,·)) ≤1 ⇒ λis properλisη-mixable and proper ⇒ ∀π, π0Eπeη(λ(π,·)−λ(π
0,·))
Proper Loss Functions
λis proper if for anyπ, π0 ∈ P(Ω)
Eπλ(π,·)≤Eπλ(π0,·)
Ifω∼πthenEπλ(π0, ω)is the expected loss for predictionπ0.
The expected loss is minimal for the true distribution
⇒the forecaster is encouraged to give the true probabilities
Example: Hellinger Loss
λHellinger(γ, ω) = 1 2 r X j=1 p γ(j)−qI{ω=j}2The Hellinger loss is√2-mixable
Proper Mixable Loss Functions
Each mixable loss functionλ(γ, ω)has a proper analogueλproper(π, ω)
such that
1 ∀π∃γ∀ω λproper(π, ω) =λ(γ, ω) 2 ∀π∀γ Eπλproper(π,·)≤Eπλ(γ,·)
For the Hellinger loss, the proper analogue is the spherical loss λspherical(π, ω) =1−q π(ω)
Pr
j=1(π(j))2 λspherical(π, ω) =λHellinger(γ, ω)forγ(ω) = P(rπ(ω))2
Proper Mixable Loss Functions
Each mixable loss functionλ(γ, ω)has a proper analogueλproper(π, ω)
such that
1 ∀π∃γ∀ω λproper(π, ω) =λ(γ, ω) 2 ∀π∀γ Eπλproper(π,·)≤Eπλ(γ,·)
For the Hellinger loss, the proper analogue is the spherical loss λspherical(π, ω) =1−q π(ω)
Pr
j=1(π(j))2 λspherical(π, ω) =λHellinger(γ, ω)forγ(ω) = P(rπ(ω))2
Example: Mixable and Non-Mixable Losses
Experts 1, . . . ,K predictπ(k)∈ P({0,1}). Experts 1, . . . ,Npredictγ(n)∈ {0,1}.
Learner predicts(π,π˜)∈ P({0,1})× P({0,1})such that ifπ(0)>1/2 thenπ˜(0) =1 and ifπ(1)>1/2 then˜π(1) =1.
There exists a strategy for Learner that guarantees for anyk
T X t=1 λsquare(πt, ωt)≤ T X t=1 λsquare(π(tk), ωt) +ln(K +N)
and for anym
T X t=1 λabs(˜πt, ωt)≤ T X t=1 λsimple(γt(n), ωt) +O( p Tln(K +N) +Tln lnT)
Tennis Predictions, Square and Absolute Losses
Graphs of the negative regretLoss(Em)
k (T)−Loss (m)(T) 0 50 100 150 200 250 300 350 400 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 theoretical bounds DF experts 0 50 100 150 200 250 300 350 400 −2 −1.5 −1 −0.5 0 0.5 DF experts
square loss absolute loss
Learner optimizes for both loss functions, using the DF algorithm with “mixability” and Hoeffding supermartingales.
The Mixed Supermartingale
1 K +N K X k=1 e2PTt=−11((pt−ωt)2−(p (k) t −ωt)2)×e2((p−ω)2−(p (k) T −ω) 2) + 1 K +N N X n=1 Z 1/e 0 dη η ln1η2 eηPTt=−11(|p˜t−ωt|−[γ(tn)6=ωt])−η2/2 ×eη(|˜p−ω|−[γT(n)6=ω])−η 2/2 wherept =πt(1),p(tk)=πt(k)(1),˜pt = ˜πt(1). [x 6=y] =1 ifx 6=y and[x 6=y] =0 ifx =y.References
V. Vovk, F. Zhdanov. Predictions with Expert Advice for Brier Game.
ICML 2008. http://arxiv.org/abs/0710.0485
A. Chernov, Y. Kalnishkan, F. Zhdanov, V. Vovk. Supermartingales in Prediction with Expert Advice. ALT 2008 and TCS.
http://arxiv.org/abs/1003.2218
A. Chernov, V. Vovk. Prediction with expert evaluators’ advice.
ALT 2009. http://arxiv.org/abs/0902.4127
A. Chernov, V. Vovk. Prediction with Advice of Unknown Number of
Experts. UAI 2010. http://arxiv.org/abs/1006.0475