Kullback Leibler simplex

(1)

Munich Personal RePEc Archive

Kullback-Leibler simplex

Kangpenkae, Popon

Thammasat University

November 2011

Online at

https://mpra.ub.uni-muenchen.de/34921/

(2)

KULLBACK-LEIBLER SIMPLEX

Abstract. This technical reference presents the functional structure and the algorithmic implemen-tation of KL (Kullback-Leibler) simplex. It details the simplex approximation and fusion. The KL simplex is fundamental, robust, adaptive an informatics agent for computational research in econom-ics, finance, game and mechanism. From this perspective the study provides comprehensive results to facilitate future work in such areas.

God does not care about our mathematical difficulties. There is nothing free, except the grace of God. He integrates empirically. Albert Einstein True Grit (2010)

1. Introduction

This paper presents an alternative for sequential optimizing agent which is crucial for the reliability of computational economics research. In particular it is a version of online classifier, a machine learning which processes classification with data stream. The sequential implementation makes it efficient, fast and practical data flow processing. Among this type of classifier, informatics divergence approach stands out with solid foundation in mathematical statistics and informatics theory. It is instructive to see the difference of the two approaches. Standard approach targets the performance in objective function, while the informatics works with statistical measures, e.g. Kullback-Leibler and Renyi divergence [CoDrRo07]. Positively the informatics agent can be effective alternative to standard sequential optimizers.

Furthermore informatics approach delivers powerful concepts, e.g. (i) the advance will leverage the notion and insight from dynamic programming [Sn10]; when a control is simplex and transition matrix, it has a strong foundation in probability and Markov chain [Be00]. (ii) model-free or agnostic data makes it capable of deriving superior second-order perceptron working the real-world data [CeCoGe05]. This approach consequently can improve machine learning that is robust and applicable for computational research in economics, finance, game and mechanism.

The next section lists useful formula and identity. Section 3 presents the structure of online ma-chine learning [CrDrFe08, LiHoZhGo11] and key results; section 4 discusses the implementation. The instructive remarks are in section 5 and the proof is in Appendix.

2. The matrix

Simplex[ChYe11].

h1iµ∈←△ ⇔→ µ·1 = 1andh2iµ∈ △ ⇔µ∈←△→, withmin (µ)≥0.

Taylor expansion. ln (µ·xi)≈ln (µ_i·xi) +(µ−µi)·xi

µi·xi .

Date: November 9, 2011.

Key words and phrases. KL divergence, second-order perceptron, informatics agent, simplex projection and fusion.

Acknowledgments.Economics department at Thammasat and Queen’s university greatly supports the study. The author warmly thanks Frank Flatters, Frank Milne, Ted Neave and Pranee Tinakorn for the encouragement, without which this paper would not had been written. He certainly appreciates comments and discussions with the Sukniyom, Ko-Kao, Chai-Shane, Lek-Air, Ron-Fran, Pui, NaPoj and of course, Kay and Nongyao.

Contact.facebook.com/popon.kangpenkae.

(3)

Symmetric squared decomposition (SSD). Σ[i] = Υ2

[i],Υ[i] = Q[i]

q

diag λ[i,]1, . . . , λ[i,]d

Q⊤_[i] : Q[i] is

orthogonal and the eigenvector ofΣ[i]; λ[i,]1, . . . , λ[i,]d

is the eigenvalue of Σ[i]. Of course Υ[i],Σ[i] is symmetric PSD.

Inversion[PePe08][146].

(A+BCD)−1=A−1−A−1B C−1+DA−1B−1

DA−1

for our application,

Σ−1= Σ−_i 1+xix ⊤ i

c ⇒Σ = Σi−

Σixix⊤_i Σi

c+x⊤_i Σixi

Differentiation[PePe08] [78,49,102,83].

∂

∂µ(µi−µ) Υ− 2

i (µi−µ) = 2Υ− 2

i (µ−µi)

∂

∂Υln det Υ 2

= 2Υ−1

∂

∂ΥTr Υ− 2 i Υ2

= Υ−_i 2Υ + ΥΥ−_i2

∂

∂Υx⊤i Υ2xi=xix⊤_i Υ + Υxix⊤_i = _∂Υ∂ kΥxik2

∂

∂ΥkΥxik= x

ix⊤iΥ+Υxix⊤i

2kΥxik =

xix⊤iΥ+Υxix⊤i

2√x⊤iΥ2xi

KL divergence.

DKL N µ,Υ2||N µ_i,Υ2_i=1

2

ln

_{det Υ}2 i det Υ2

+Tr Υ−_i2Υ2

+ (µ_i−µ) Υ−_i 2(µ_i−µ)

3. Approximation

3.1. This section refers to [CeZe97, CrDrFe08, LiHoZhGo11] for the model concept and definition. As KL simplex solution in △does not have a closed form, the approximation will start with←△→,

µ_i+1,Σi+1=argminDKL(N(µ,Σ)||N(µ_i,Σi))

subject to~₍_y_i_f₍_µ_·_x_i)₋_ǫ₎_≥_φp_x⊤

i Σxi, yi∈ {−1,1},andµ∈←△→.

Applying the main result in [LiChLiMaVi04][V I.2],an invariance theorem is straightforward,

Theorem. The optimal pair µi+1,Σi+1

is invariant to similarity-metric divergences.

We consider

normal, hinge, hinge2

constraint (see section Section 5), with two flavors: {linear,logarithm}={[ln],[ln]} ∋f(.).LetΣ[i] = Υ2

[i] whereΥ[i] has SSD, the~−Lagrangian is

L=1 2

ln

det Υ2 i det Υ2

+Tr Υ−_i2Υ2

+ (µi−µ) Υ−_i 2(µi−µ)

+α(φkΥxik −~) +ρ(µ·1−1)

(4)

3.2. [normal],~_∅.

3.2.1. Linear:~_∅_[ln]_.

Lemma 1. Σ−i+11 ◮~∅[ln]

Σ−i+11 = Σ−i 1+αφ

xix⊤_i

p

x⊤_i Σi+1xi

Lemma 2. Σi+1◮~_∅_[ln]

Σi+1= Σi−βΣixix⊤_i Σi

whereβ= √ αφ

ui+αφυi,(ui, υi)≡ x

⊤

i Σi+1xi,x⊤_i Σixi

.

Lemma 3. √ui◮~∅[ln]

√_u

i= −

αφυi+

p

α2_φ2_υ2 i + 4υi 2

Lemma 4. µ_i+1◮~_∅[ln]

µ_i+1=µ_i+αyiΣi(xi−xi)

where x=x1≡ 1

⊤_Σ_i_x_i

1⊤_Σi11

Lemma 5. α◮~_∅_[ln]_{, α}₌j−b±√b2−4ac

2a

k

such that

(a, b, c) =λ′λ′+υiφ2

,2λλ′+υiφ2

2

, λ2₋_υ iφ2

λ, λ′= yi(µ_i·xi)−ǫ,x⊤_i Σi(xi−xi)

3.2.2. Logarithm:~_∅_[ln]_.

Lemma 6. Σ−i+11 ◮~∅[ln] ≡Lemma 1.

Lemma 7. Σi+1◮~

∅[ln] ≡Lemma 2.

Lemma 8. √ui◮~∅[ln] ≡Lemma 3.

Lemma 9. µi+1◮~∅[ln]

µi+1≈µi+

αyi

µ_i·xi

Σi(xi−xi),

where x=x1≡1

⊤_Σ_i_x_i

1⊤_Σi11.

Lemma 10. α◮~_∅[ln]_{, α}₌j−b±√b2−4ac

2a

k

such that

,2λλ′+υiφ2

2

, λ2₋_υ iφ2

λ, λ′≈yiln (µ_i·xi)−ǫ,x

⊤

iΣi(xi−xi)

(µi·xi)2

(5)

3.3. [hinge],~₁ _and

hinge2

,~₂.

3.3.1. Linear:~_1[ln]_,~_2[ln]_.

Σ−_i+11 ◮~_[1,2][ln]_,_{Lemma 11} _≡_{Lemma 21} _≡_{Lemma 1,}_Σ−1

i+1◮~∅[ln] Σi+1◮~_[1,2][ln]_,_{Lemma 12} _≡_{Lemma 22} _≡_{Lemma 2,}_Σi+1◮~_∅_[ln]

√_u

i ◮~[1,2][ln], Lemma 13 ≡Lemma 23 ≡Lemma 3,√ui ◮~∅[ln]

Lemma 14. µ_i+1◮~1[ln]

µi+1=µ_i+hyi(µ_i·xi)−ǫiαyiΣi(xi−xi) where xi=xi1≡1

⊤_Σ_i_x_i

1⊤Σi11

Lemma 15. α◮~_1[ln]_{, α}₌j−b±√b2−4ac

2a

k

such that

,2λλ′+υiφ2

2

, λ2₋_υ iφ2

,

λ, λ′= yi(µ_i·xi)−ǫ,x⊤_i Σi(xi−xi)

Lemma 24. µi+1◮~2[ln]

µ_i+1=µ_i+

yi(µ_i·xi)−ǫ

0.5α−1₋_x⊤

i Σi(xi−xi)

yiΣi(xi−xi)

where xi=xi1≡ 1

⊤_Σ_i_x_i

1⊤Σi11

Lemma 25. α◮~_2[ln]_{, α}₌j−b±√b2−4ac

2a

k

such that

,2λλ′+υiφ2

2

, λ2₋_υ iφ2

λ, λ′≈(yi(µ_i·xi)−ǫ)2,4λx⊤_i Σi(xi−xi)

3.3.2. Logarithm:~_1[ln]_,~_2[ln]_.

Σ−_i+11 ◮~_[1,2][ln]_,_{Lemma 16}_≡_{Lemma 26}_≡_{Lemma 6,}_Σ−1

i+1◮~∅[ln] Σi+1◮~_[1,2][ln]_,_{Lemma 17}_≡_{Lemma 27}_≡_{Lemma 7,}_Σi+1◮~_∅_[ln]

√_u

i ◮~[1,2][ln],Lemma 18 ≡Lemma 28≡Lemma 8,√ui ◮~∅[ln]

µi+1=µ_i+hyiln (µ_i·xi)−ǫi αyi µ_i·xi

Σi(xi−xi)

where xi=xi1≡1

⊤_Σ

ixi

(6)

Lemma 20. α◮~_1[ln]_{, α}₌j−b±√b2−4ac

2a

k

such that

,2λλ′+υiφ2

2

, λ2₋_υ iφ2

λ, λ′≈yiln (µ_i·xi)−ǫ,x

⊤

iΣi(xi−xi)

(µi·xi)2

Lemma 29. µi+1◮~2[ln]

µ_i+1≈µ_i+

   

yiln (µ_i·xi)−ǫ

0.5α−1₋xi⊤Σi(xi−xi)

(µi·xi)2

   

yi

µ_i·xi

Σi(xi−xi)

where xi=xi1≡ 1

⊤_Σ

ixi

1⊤_Σi11

Lemma 30. α◮~_2[ln]_{, α}₌j−b±√b2−4ac

2a

k

such that

,2λλ′+υiφ2

2

, λ2₋_υ iφ2

λ, λ′≈(yiln (µ_i·xi)−ǫ)2,4λx

⊤

iΣi(xi−xi)

(µi·xi)2

4. Implementation

4.1. Results in section 3. is valid for the←△→simplex. A more common constraint is△simplex; however the close-form solution is not possible with this simplex. Projecting simplex ←△→ on △ is a practical approximation; the effectiveness of this method is reported in [LiHoZhGo11]. The projection neces-sarily requires a certain transformation ofΣ-covariance matrix. Further information on implementing projection algorithm and covariance transformation is in [ChYe11] and [LiHoZhGo11], respectively.

Conjecture. Correlation transform is an nSD-effective covariance transformer.

4.2. Section 3 presents various choices of simplex, from which one can limit the set of simplex using statistical dominance concept, e.g. nSD-effective. Then projecting the simplex and integrating orfusing

them which is, in practice, an empirical issue. We define a new simplex fusing method FED (fusing extensive dimension) as follows. Let △i∈{1...m} be a set ofnSD-effective simplex, each △i ∈ [0,1]N.

Connectm subsimplex into a vector in [0,1]m·N; apply simplex projection to the vector. The result is simplex△ ∈[0,1]m·N; overlay simplex△, i.e. slot△ into mvectors in[0,1]N and sum the vectors with the proper array. The overlay will compose aFED simplex ∈[0,1]N.

Conjecture. FED simplex is an nSD-effective fuse of its nSD-effective subsimplex.

}_{nSD-effective} _{is empirical non-dominated, wrt. to the}_n_{-order stochastic dominance definition [Da06].}

5. Remark

5.1. The logic of confidence constraint. Suppose F(w)−µF(w)

σF(w) =ZΦ−cdf; consider a generic confidence constraint Pr(F(w)≥0)≥η≡Φ (φ).

Pr

_F₍

w)−µF(w)

σF(w) ≥

−µF(w)

σF(w)

≥η⇒Φ

₋_µ

F(w)

σF(w)

(7)

−µF(w)

σF(w) ≤

Φ−1(1−η) =−Φ−1(η)⇒µF(w)≥Φ−1(η)σF(w)=φσF(w)

, i.e the conditionsign(φF(σw)−F(µw))⇐⇒ sign φσF(w)−µF(w)

determines the[exactness]

property of confidence constraint.

5.2. The[exactness]property of

normal,hinge,hinge2

confidence. Define

normal,hinge,hinge2

func-tion as follows,

normal: ~_∅_[f]_∈~_∅_[ln]_,~_∅_[ln] _{≡ {}_y_i₍µ·xi)−ǫ, yiln (µ·xi)−ǫ}

hinge: ~_1[f]_∈_~

1[ln],~1[ln] ≡ {⌊yi(µ·xi)−ǫ⌋,⌊yiln (µ·xi)−ǫ⌋}

hinge2: ~_2[f] _∈_~

2[ln],~2[ln] ≡

n

⌊yi(µ·xi)−ǫ⌋2,⌊yiln (µ·xi)−ǫ⌋2

o

, as a result of assumptionw∼N µ,Σ = Υ2;

normal: ~_∅_[ln] _{is exact;}~_∅_[ln] _{is approximate}

F(w) =yi(w·xi)−ǫ⇒

µF(w), σF(w)2

=yi(µ·xi)−ǫ,x⊤_i Σxi =kΥxik2

F(w) =yiln (w·xi)−ǫ⇒

µF(w), σF(w)2

≈ yiln (µ·xi)−ǫ,x⊤_i Σxi

hinge: ~_1[ln][ln] _{is approximate}

F(w) =⌊yi(w·xi)−ǫ⌋ ⇒

µF(w), σ_F2(w)

≈ ⌊yi(µ·xi)−ǫ⌋,x⊤_i Σxi

F(w) =⌊yiln (w·xi)−ǫ⌋ ⇒

µF(w), σF(w)2

≈ ⌊yiln (µ·xi)−ǫ⌋,x⊤_i Σxi

hinge2: ~_2[ln][ln] _{is approximate}

F(w) =⌊yi(w·xi)−ǫ⌋2⇒

µF(w), σF(w)2

≈⌊yi(µ·xi)−ǫ⌋2,x⊤_i Σxi

F(w) =⌊yiln (w·xi)−ǫ⌋2⇒

µF(w), σF(w)2

≈⌊yiln (µ·xi)−ǫ⌋2,x⊤_i Σxi

Appendix

Lemma 1. Σ−_i+11 ◮~_∅_[ln]

∂

∂ΥL= 0 =−Υ −1₊1

2Υ −2 i Υ +

1 2ΥΥ

−2 i +αφ

xix⊤_i Υ

2px⊤_i Υ2xi

+αφ Υxix

⊤ i 2px⊤_i Υ2xi Υ−1 update condition is,

Υ−1=1 2Υ

−2 i Υ +

1 2ΥΥ

−2 i +αφ

xix⊤_i Υ

2px⊤_i Υ2xi

+αφ Υxix⊤i 2px⊤_i Υ2xi

Υ−1

Start with the solution,Υ−2 _{implicit update,}

Υ−2≡Υ−_i+12 = Υ_i−2+αφ xix⊤i

p

x⊤_i Υ2xi

(8)

which yields

Υ−1 2 =

Υ−_i2Υ

2 +

αφ

2 ·

xix⊤_i Υ

p

x⊤_i Υ2xi

[×Υ]

Υ−1 2 =

ΥΥ−2_i

2 +

αφ

2 ·

Υxix⊤_i

p

x⊤_i Υ2xi

[Υ×]

Υ−2

⇒ [×Υ] + [Υ×] ⇒

Υ−1

, i.e. Υ−2₋_{implicit update satisfying} _Υ−1₋_{update. The result is}

direct from the replacement Υ2 i,Υ2

= (Σi,Σi+1) :

Σ−1i+1= Σ−1i +αφ

xix⊤_i

p

x⊤_i Σi+1xi

Apply matrix inversion toΣ−_i+11 = Σ_i−1+αφ√ xix⊤i

x⊤iΣi+1xi

,

Σi+1= Σi− _√ Σixix⊤i Σi x⊤iΣi+1xi

αφ +x⊤i Σixi

= Σi− αφΣixix⊤i Σi

p

x⊤_i Σi+1xi+αφx⊤_i Σixi

Σi+1= Σi−αφΣixix⊤i Σi

√_u

i+αφυi

= Σi−βΣixix⊤_i Σi

Σi+1= Σi−αφΣixix

⊤ i Σi

√_u

i+αφυi ⇒

x⊤_i Σi+1xi=x⊤_i Σixi−

αφ x⊤_i Σixi

x⊤_i Σixi

√_u

i+αφυi

ui=υi−

αφυ2 i

√_u

i+αφυi ⇒

√

ui= −

αφυi+

p

α2_φ2_υ2 i + 4υi 2

Lemma 4. µi+1◮~∅[ln]

∂

∂µL= 0 = Υ

−2

i (µ−µi)−α~

′

∅f

′

yixi+ρ1;

∂

∂ρL= 0 =µ·1−1

Υ−_i 2(µ−µ_i)−α~

′

∅f

′

yixi+ρ1 =0⇒µ=µ_i+ Υ2_i

α~′

∅f

′

yixi−ρ1

1⊤µ=1⊤µ_i+α~

′

∅f

′

yi1⊤Υ2ixi−ρ1⊤Υ2i1

ρ1=α~′

∅f

′

yi

1⊤Υ2ixi

1⊤Υ2 i1

1=α~′

∅f

′

yixi ⇒µ=µ_i+α~

′

∅f

′

yiΥ2i(xi−xi)

use~′

∅(.) = 1, f

′

(.) = 1andΥ2

(9)

Lemma 5. α◮~_∅_[ln]

From Lemma 3 √ui = −αφυ

i+

√

α2_φ2_υ2

i+4υi

2 , which can be simplified with λ+λ

′

α = φ√ui . Its

quadratic isaα2₊_bα₊_c_{= 0}_{, such that}₍_{a, b, c}_{) =}_λ′

λ′+υiφ2

,2λλ′+υiφ2

2

, λ2₋_υ iφ2

. The

solution toλ+λ′α=φ√ui is −b± √

b2₋_4ac

2a . We chooseα=

j

−b±√b2₋_4ac

2a

k

to ensure validα≥0.

To find λ, λ′, use binding constraint φkΥxik = ~∅[ln] ⇒ φkΥxik = yi(µ·xi)−ǫ. Apply the

updateµ=µ_i+αyiΣi(xi−xi)and√ui≡ kΥxik,

φ√ui=yiµ_i·xi−ǫ+αx⊤_i Σi(xi−xi)

i.e. λ, λ′= yiµ_i·xi−ǫ,x⊤_i Σi(xi−xi).

Lemma 6. Σ−_i+11 ◮~_∅_[ln]

≡ Lemma 1.

≡Lemma 2.

≡Lemma 3.

Lemma 9. µ_i+1◮~_∅[ln]

Similar to Lemma 4, µ = µ_i +α~′_∅f′yiΥ2_i(xi−xi) ; use ~′_∅(.) = 1; ln (µ·xi) ≈ ln (µ_i·xi) +

(µ−µi)·xi

µi·xi ⇒f

′

(.) =_µ1

i·xi andΥ

2

i = Σi,which gives µi+1=µ≈µi+ αy

i

µi·xiΣi(xi−xi)

Lemma 10. α◮~_∅_[ln]

Similar to Lemma 5,α=j−b±√b2_−4ac

2a

k

where(a, b, c) =λ′λ′+υiφ2

,2λλ′+υiφ2

2

, λ2₋_υ iφ2

.

To find λ, λ′, set the constraint bindingφkΥxik =~∅[ln] ⇒φkΥxik =yiln (µ·xi)−ǫ. Apply

the update µ= µ_i+_µαyi

i·xiΣi(xi−xi) and

√_u

i ≡ kΥxik and the approximation yiln (µ·xi)−ǫ ≈

yi

ln (µ_i·xi) +(µ−µ_µ i)·xi

i·xi

−ǫ

φ√ui≈yiln (µ_i·xi)−ǫ+αx

⊤

i Σi(xi−xi)

(µ_i·xi)2 ·

, i.e. λ, λ′≈yiln (µ_i·xi)−ǫ,x

⊤

iΣi(xi−xi)

(µi·xi)2

.

Lemma 11. Σ−1i+1◮~1[ln]

≡Lemma 1.

Lemma 12. Σi+1◮~_1[ln]

(10)

Lemma 13. √ui◮~1[ln]

≡Lemma 3.

Similar to Lemma Lemma 4, µ = µ_i +α~′₁f′yiΥ_i2(xi−xi). There are two cases, yi(µ·xi)−

ǫ[>] [≤] 0.

Case [>]: ~′

1(.) = 1, f

′

(.) = 1and Υ2

i = Σi,⇒µi+1=µ=µi+αyiΣi(xi−xi)

Case [≤]: ~′₁₍_._{) = 0}⇒µ_i+1=µ=µ_i

With some manipulation we find aµ−update

µi+1=µ_i+hyi(µ_i·xi)−ǫiαyiΣi(xi−xi)

Lemma 15. α◮~_1[ln]

2a

k

,2λλ′+υiφ2

2

, λ2₋_υ iφ2

.

To findλ, λ′, use binding constraintφkΥxik=~1[ln]⇒φkΥxik=⌊yi(µ·xi)−ǫ⌋.We only need

the update-caseyi(µ·xi)−ǫ >0. Apply the updateµ=µi+αyiΣi(xi−xi)and√ui≡ kΥxik,

φ√ui =⌊yi(µ·xi)−ǫ⌋=yi(µ·xi)−ǫ=yiµ_i·xi−ǫ+αx⊤_i Σi(xi−xi)

i.e. λ, λ′= yiµ_i·xi−ǫ,x⊤_i Σi(xi−xi).

Lemma 16. Σ−_i+11 ◮~_1[ln]

≡ Lemma 6.

≡Lemma 7.

≡Lemma 8.

Similar to Lemma 14, µ=µ_i+α~₁′f′yiΥ2_i(xi−xi)with two cases,yiln (µ·xi)−ǫ[>] [≤] 0.

Case [>]: ~′

1(.) = 1,ln (µ·xi)≈ln (µi·xi) +(µ−µ

i)·xi

µi·xi ⇒f

′

(.) =_µ1

i·xi andΥ

2 i = Σi

µi+1=µ=µ_i+ αyi µ_i·xi

Σi(xi−xi)

Case [≤]: ~′

1(.) = 0⇒µi+1=µ=µ_i

With some manipulation we find aµ−update

µ_i+1=µ_i+hyiln (µ_i·xi)−ǫi αyi µi·xi

Σi(xi−xi)

(11)

2a

k

,2λλ′+υiφ2

2

, λ2₋_υ iφ2

.

To find λ, λ′, set the constraint binding φkΥxik =~1[ln] ⇒ φkΥxik = ⌊yiln (µ·xi)−ǫ⌋. We

only need the update-case yiln (µ·xi)−ǫ > 0. Apply the update µ= µi+ αy

i

µi·xiΣi(xi−xi) and

√_u

i≡ kΥxik and the approximationyiln (µ·xi)−ǫ≈yi

ln (µ_i·xi) +(µ−µ_µ i)·xi

i·xi

−ǫ,

φ√ui=⌊yiln (µ·xi)−ǫ⌋=yiln (µ·xi)−ǫ≈yiln (µ_i·xi)−ǫ+αx

⊤

i Σi(xi−xi)

(µ_i·xi)2

, i.e. λ, λ′≈yiln (µ_i·xi)−ǫ,x

⊤

iΣi(xi−xi)

(µi·xi)2

.

Lemma 21. Σ−i+11 ◮~2[ln]

≡Lemma 1.

≡Lemma 2.

≡Lemma 3.

Similar to Lemma Lemma 4, µ = µ_i +α~′₂f′yiΥ_i2(xi−xi). There are two cases, yi(µ·xi)−

ǫ[>] [≤] 0.

Case [>]: ~′

2(.) = 2 (yi(µ·xi)−ǫ); usef′(.) = 1andΥ2_i = Σi, µ=µ_i+ 2α(yi(µ·xi)−ǫ)yiΣi(xi−xi)

yi(µ·xi)−ǫ=yi(µ_i·xi)−ǫ+ 2α(yi(µ·xi)−ǫ)x⊤_i Σi(xi−xi)

WriteX =yi(µ·xi)−ǫ, C =yi(µ_i·xi)−ǫ, S= 2αx_i⊤Σi(xi−xi),

(µ, X) =

µ_i+ 2αXyiΣi(xi−xi), C+SX= C

1−S

Case [≤]: ~′

2(.) = 0⇒(µ, X) = (µi+ 2αXyiΣi(xi−xi),0).

We can conclude the update µi+1=µ=µ_i+ 2α⌊X⌋yiΣi(xi−xi)

µi+1=µ_i+

_y

i(µ_i·xi)−ǫ

0.5α−1₋

x⊤_i Σi(xi−xi)

yiΣi(xi−xi)

(12)

2a

k

,2λλ′+υiφ2

2

, λ2₋_υ iφ2

.

To findλ, λ′, use binding constraint0≤φkΥxik=~2[ln] ⇒φkΥxik=⌊yi(µ·xi)−ǫ⌋2.We only

need the update-caseyi(µ·xi)−ǫ >0. Apply the updateµ=µ_i+_0.5α−y1i_−x(µi⊤·xi)−ǫ

iΣi(xi−xi)·yiΣi(xi−xi)

and√ui≡ kΥxik.

φ√ui=⌊yi(µ·xi)−ǫ⌋2= (yi(µ·xi)−ǫ)2

φ√ui=

yiµ_i·xi−ǫ+

yi(µ_i·xi)−ǫ

0.5α−1₋_x⊤

i Σi(xi−xi)·

x⊤_i Σi(xi−xi)

2

Suppose g(α) = A+ AC 0.5α−1_−C

2

, with (A, C, α0) = yi(µ_i·xi)−ǫ,x⊤_i Σi(xi−xi),0 and use

Taylor expansiong(α)≈g(α0) +g′(α0) (α−α0). It follows that g(0), g′(0)= A2_,₄_A2_C

, thus

φ√ui =g(α)≈A2+ 4A2Cα⇒

λ, λ′≈(yi(µ_i·xi)−ǫ)2,4λx⊤_i Σi(xi−xi)

.

Lemma 26. Σ−i+11 ◮~2[ln]

≡ Lemma 6.

≡Lemma 7.

≡Lemma 8.

Similar to Lemma 24, µ=µ_i+α~₂′f′yiΥ2_i(xi−xi)with two cases,yiln (µ·xi)−ǫ[>] [≤] 0.

Case [>]: ~′₂₍_._{) = 2 (}_y_i_{ln (}_µ_·_x_i)₋_ǫ₎_{; use}_{ln (}_µ_·_x_i)_≈_{ln (}_µ

i·xi) +

(µ−µi)·xi

µi·xi ⇒f

′

(.) = 1 µi·xi and

Υ2 i = Σi,

µ≈µ_i+ 2α

yi

ln (µ_i·xi) +(µ−µi)·xi µ_i·xi

−ǫ

_y

i

µ_i·xi

Σi(xi−xi)

(µ−µ_i)yixi

µi·xi ≈

2αx

⊤

i Σi(xi−xi)

(µ_i·xi)2

yiln (µ_i·xi)−ǫ+(µ−µi)yixi µi·xi

WriteX =(µ−µi)yixi

µi·xi , C =yiln (µi·xi)−ǫ, S= 2α

x⊤iΣi(xi−xi)

(µi·xi)2 ,

hence X =S(C+X) = SC 1−S and

(µ, C+X)≈

µ_i+ 2α(C+X)· yi µ_i·xi

Σi(xi−xi), C

1−S

Case [≤]: ~′₂₍_._{) = 0}_⇒₍µ, C+X) =

µ_i+ 2α(C+X)· yi

µi·xiΣi(xi−xi),0

(13)

We can conclude with the update µi+1=µ≈µ_i+ 2α⌊C+X⌋yiΣi(xi−xi)

µi+1≈µ_i+

   

yiln (µ_i·xi)−ǫ

0.5α−1₋xi⊤Σi(xi−xi)

(µi·xi)2

   

yi

µ_i·xi

Σi(xi−xi)

Similar to Lemma 25,α=j−b±√b2₋_4ac

2a

k

,2λλ′+υiφ2

2

, λ2₋_υ iφ2

.

To find λ, λ′, use binding constraint0 ≤φkΥxik=~2[ln] ⇒φkΥxik=⌊yiln (µ·xi)−ǫ⌋2. We

only need the update-caseyiln (µ·xi)−ǫ >0. Apply the update

µ=µ_i+ yiln (µi·xi)−ǫ

0.5α−1₋x⊤iΣi(xi−x)

(µi·xi)2

· yi

µ_i·xi

Σi(xi−xi)

and√ui≡ kΥxik to haveφ√ui =⌊yiln (µ·xi)−ǫ⌋2= (yiln (µ·xi)−ǫ)2.

Use the approximation yiln (µ·xi)−ǫ≈yi

ln (µ_i·xi) +(µ−_µµi)·xi

i·xi

−ǫ,

φ√ui ≈

yi

ln (µ_i·xi) +(µ−_µµi)·xi

i·xi

−ǫ2

≈



yiln (µ_i·xi)−ǫ+ yiln(µi·xi)−ǫ

0.5α−1₋x⊤iΣi(xi−xi)

(µi·xi)2

· x⊤iΣi(xi−xi)

(µi·xi)2

 

2

Similar to Lemma 25, with (A, C, α0) =yiln (µ_i·xi)−ǫ,x

⊤

iΣi(xi−xi)

(µi·xi)2 ,0

; one can show φ√ui ≈

A2_{+ 4}_A2_Cα_⇒_{λ, λ}′

≈(yiln (µ_i·xi)−ǫ)2,4λx

⊤

iΣi(xi−xi)

(µi·xi)2

.

References

[Be00] E. Behrends.Introduction to Markov Chains with Special Emphasis on Rapid Mixing.Advanced Lecture Notes in Mathematics, Vieweg Verlag, 2000.

[CeCoGe05] N. Cesa-Bianchi, A. Conconi, and C. Gentile. A second-order perceptron algorithm.SIAM Journal of Com-mutation, 34(3): 640–668, 2005.

[CeZe97] Y. Censor and S.A. Zenios.Parallel Optimization: Theory, Algorithms, and Applications. Oxford University Press, 1997.

[ChYe11] Y. Chen and X. Ye. Projection onto a simplex. Department of Mathematics, University of Florida, 2011. [CoDrRo07] J.F. Coeurjolly, R. Drouilhet, and J.F. Robineau. Normalized information-based divergences.Problems of

Information Transmission, 43(3): 167–189, 2007.

[CrDrFe08] K. Crammer, M. Dredze, and F. Pereira. Exact convex confidence-weighted learning.Neural Information Processing Systems (NIPS), 2008.

[Da06] R. Davidson. Stochastic dominance. Department of Economics, McGill University, 2006.

[LiChLiMaVi04] M. Li, X. Chen, X. Li, B. Ma and P.M.B. Vitányi. The similarity metric.IEEE Trans. Inform. Theory, 50(12): 3250–3264, 2004.

[LiHoZhGo11] B. Li, S.C.H. Hoi, P. Zhao, and V. Gopalkrishnan. Confidence weighted mean reversion strategy for on-line portfolio selection. School of Computer Engineering, Nanyang Technological University, 2011.

[PePe08] K.B. Petersen and M.S. Pedersen.The Matrix Cookbook, 2008.