A Generalization of f-Divergence Measure to Convex Functions Defined on Linear Spaces

(1)

CONVEX FUNCTIONS DEFINED ON LINEAR SPACES

S.S. DRAGOMIR

Abstract. In this paper we generalise the concept off-divergence to a convex function de…ned on a convex cone in a linear space. Some fundamental results are established.

1. Introduction

Given a convex functionf : [0;₁)_!R, thef divergence functional

(1.1) If(p;q) =

n X

i=1 qif

pi qi

;

was introduced by Csiszár [3]-[4] as a generalized measure of information, a “dis-tance function”on the set of probability distribution Pn: The restriction here to discrete distributions is only for convenience, similar results hold for general distri-butions. As in Csiszár [3]-[4] , we interpret unde…ned expressions by

f(0) = lim

t!0+f(t); 0f 0 0 = 0; 0f a₀ = lim

"!0+"f a

" =a_tlim_!1 f(t)

t ; a >0:

The following results were essentially given by Csiszár and Körner [5].

Proposition 1. (Joint Convexity) If f : [0;₁)_!R is convex, then If(p;q)is

jointly convex inpandq.

Proposition 2. (Jensen’s inequality) Letf : [0;₁)_!Rbe convex. Then for any p;q₂[0;₁)n _with_P

n:= n P i=1

pi>0,Qn:= n P i=1

qi>0, we have the inequality

(1.2) If(p;q) Qnf Pn

Qn :

If f is strictly convex, equality holds in (1.2) i¤

p1 q1 =

p2

q2 =:::= pn qn:

It is natural to consider the following corollary.

Corollary 1. (Nonnegativity) Letf : [0;₁)_!Rbe convex and normalised, i.e.,

(1.3) f(1) = 0:

Date: January 07, 2009.

1991Mathematics Subject Classi…cation. 26D15; 94.

(2)

Then for anyp;q2[0;₁)n _with _P

n=Qn, we have the inequality

(1.4) If(p;q) 0:

If f is strictly convex, equality holds in (1.4) i¤

pi=qi for all i2 f1; :::; ng.

In particular, if p;q are probability vectors, then Corollary 1 shows that, for strictly convex and normalizedf : [0;₁)_!Rthat

(1.5) If(p;q) 0 andIf(p;q) = 0 i¤p=q.

We now give some examples of divergence measures in Information Theory which are particular cases off divergences.

Kullback-Leibler distance ([14]). The Kullback-Leibler distance D(; ) is de…ned by

D(p;q) := n X

i=1

pilog pi qi :

If we choosef(t) =tlnt,t >0, then obviously

If(p;q) =D(p;q):

Variational distance(l1 distance). Thevariational distance V(; )is de…ned

by

V(p;q) := n X

i=1

jpi qi_j:

If we choosef(t) =_jt 1_j,t₂[0;₁), then we have

If(p;q) =V(p;q):

Hellinger discrimination ([1]). The Hellinger discrimination is de…ned by

p

2h2₍_; ₎_{, where}_h2₍_; ₎_{is given by}

h2(p;q) :=1 2

n X

i=1

(ppi pqi)2:

It is obvious that iff(t) =1₂ pt 1 2, then

If(p;q) =h2(p;q):

Triangular discrimination ([17]). We de…ne triangular discrimination be-tweenpandqby

(p;q) = n X

i=1

jpi qi_j2 pi+qi

:

It is obvious that iff(t) =(t_t+11)2,t₂(0;₁), then

If(p;q) = (p;q):

Note thatp (p;q)is known in the literature as the Le Cam distance.

2 _distance. _{We de…ne the} 2 _distance _{(chi-square distance) by} D 2(p;q) :=

n X

i=1

(pi qi)2 qi

(3)

It is clear that iff(t) = (t 1)2,t₂[0;₁), then

If(p;q) =D 2(p;q):

Rényi’s divergences ([16]). For ₂Rn f0;1_g;consider

(p;q) := n X

i=1

piqi1 :

It is obvious that iff(t) =t (t₂(0;₁));then

If(p;q) = (p;q):

Rényi’s divergencesR (p;q) := ₍1 ₁₎ln [ (p;q)] have been introduced for all real orders ₆= 0; ₆= 1(and continuously extended for = 0and = 1) in [15], where the reader may …nd many inequalities valid for these divergences, without, as well as with, some restrictions forpandq:

For other examples of divergence measures, see the paper [12] and the books [15] and [18], where further references are given.

In this paper we generalize the concept off-divergence to a convex function de-…ned on a convex cone in a linear space. Some fundamental results are established.

2. The f-Divergence of an n-tuple of Vectors

Firstly, we recall that the subsetKin a linear spaceX is acone if the following two conditions are satis…ed:

(i) for anyx; y₂K we havex+y₂K;

(ii) for anyx₂Kand any 0 we have x₂K.

For a givenn-tuple of vectorsz= (z1; :::; zn)2Knand a probability distribution

q= (q1; :::; qn)2Pn with all values nonzero, we can de…ne, for the convex function f :K_!R, the followingf-divergence of zwith the distribution q (see [8]):

(2.1) If(z;q) :=

n X

i=1 qif

zi qi :

It is obvious that ifX =R, K= [0;₁)andx=p₂Pn then we obtain the usual concept of thef-divergence associated with a function f : [0;₁)_!R.

The following result concerning the mutual convexity of thef-divergence holds.

Theorem 1. Letf :K_!Rbe a convex function on the coneK:Then the function

If(; )is convex on the convex setKn Pn.

Proof. Letz= (z1; :::; zn);v= (v1; :::; vn)₂Kn_,_p_{= (p1; :::; pn)}_;_q_{= (q1; :::; qn)}₂ Pn_{two probability distributions with all values nonzero and} _; ₀_with ₊ _{= 1:}

Then we have

(2.2) If[ (v;p) + (z;q)] =If( v+ z; p+ q)

= n X

i=1

( pi+ qi)f

vi+ zi pi+ qi

= n X

i=1

( pi+ qi)f

pi pi+ qi

vi pi

+ qi

pi+ qi zi qi

(4)

Due to the convexity off;we have

(2.3) f pi

pi+ qi vi pi +

qi pi+ qi

zi qi pi pi+ qi f

vi pi +

qi pi+ qi f

zi qi

for eachi_{2 f}1; :::; n_g:

Now, on multiplying (2.3) with pi+ qi>0;summing over ifrom 1to nand utilising (2.2) we get that

If[ (v;p) + (z;q)] If(v;p) + If(z;q)

proving the desired result.

Now, for a givenn-tuple of vectors x= (x1; :::; xn)2Kn, a probability

distrib-ution q₂Pn _{with all values nonzero and for any nonempty subset} _J _of _f_{1; :::; n}_g

we have

qJ:= QJ; QJ 2P2

whereQJ :=P_j₂_Jqj; QJ := 1 QJ and

xJ := XJ; XJ 2K2

where, as above,

XJ :=X i2J

xi; and XJ :=XJ:

It is obvious that

If(xJ;qJ) =QJf XJ

QJ +QJf XJ QJ

:

The following inequality for thef-divergence of ann-tuple of vectors in a linear space holds [8]:

Theorem 2. Let f : K _! R be a convex function on the cone K: Then for any

n-tuple of vectors x= (x1; :::; xn)2Kn, a probability distributionq2Pn with all

values nonzero and for any nonempty subsetJ of _f1; :::; n_g we have (2.4) If(x;q) max

;6=J f1;:::;ngIf(xJ;qJ) If(xJ;qJ)

min

;6=J f1;:::;ngIf(xJ;qJ) f(Xn) whereXn:=Pni=1xi:

We observe that, for a givenn-tuple of vectorsx= (x1; :::; xn)2Kn;a su¢ cient

condition for the positivity ofIf(x;q)for any probability distributionq2Pn with

all values nonzero is that f(Xn) 0: In the scalar case and ifx=p2Pn; then a

su¢ cient condition for the positivity of thef-divergenceIf(p;q)is thatf(1) 0:

The case of functions of a real variable that is of interest for applications is incorporated in [8]:

Corollary 2. Let f : [0;₁)_!R be a normalized convex function. Then for any p;q₂Pn _{we have}

(2.5) If(p;q) max

;6=J f1;:::;ng QJf

PJ

QJ + (1 QJ)f

1 PJ

(5)

Remark 1. For various applications of the inequality (2.5) to particular divergence measures of interest in applications, see[8]. In order to give an example, we point out the following result

(2.6) J(p; q) ln max

;6=J f1;:::;ng

(

(1 PJ)QJ (1 QJ)PJ

(QJ PJ))!

max

;6=J f1;:::;ng

"

(QJ PJ)2 PJ+QJ 2PJQJ

# 0;

where the Je¤ reys divergence is de…ned as

(2.7) J(p; q) := n X

j=1 qj

pj qj

1 ln pj qj

= n X

j=1

(pj qj) ln pj qj

;

which is anf-divergence for f(t) = (t 1) lnt; t >0:

3. Some Upper and Lower Bounds

Let K be a convex subset of the real linear space X and let f : K _! R be a convex mapping. Here we consider the following well-known form of Jensen’s discrete inequality:

f 1

PI X

i2I pixi

! 1 PI

X

i2I

pif(xi);

whereI denotes a …nite subset of the setNof natural numbers,xi₂K; pi 0for

i₂I andPI :=P_i₂_Ipi>0:

Let us …x I _{2 P}f(N)(the class of …nite parts of N) and xi 2K (i2I): Now

consider the functionalJ :S+(I)_!Rgiven by

JI(p) :=X i2I

pif(xi) PIf 1 PI

X

i2I pixi

! 0

whereS+(I) := p= (pi)i2I pi 0; i2I and PI >0 andf is convex onK:

We observe thatS+(I)is a cone and the functionalJI is nonnegative, superad-ditive [10] and positive homogeneous onS+(I):

We have the following inequalities that are of interest in their turn as well (see [9]):

Lemma 1. If p;q ₂ S+(I) and M m 0 such that Mp q mp; i.e.,

M pi qi mpi for each i₂I;then:

(3.1) M

" X

i2I

pif(xi) PIf 1 PI

X

i2I pixi

!#

X

i2I

qif(xi) QIf 1 QI

X

i2I qixi

!

m "

X

i2I

pif(xi) PIf 1 PI

X

i2I pixi

!#

(6)

and

(3.2)

" 1 PI

X

i2I

pif(xi) f 1 PI

X

i2I pixi

!#M PI

" 1 QI

X

i2I

qif(xi) f 1 QI

X

i2I qixi

!#QI

" 1 PI

X

i2I

pif(xi) f 1 PI

X

i2I pixi

!#mPI

:

respectively.

We may state the following result:

Theorem 3. Let f : K _! _R be a convex function on the cone K: Consider an

n-tuple of vectorsz= (z1; :::; zn)2Kn and two probability distribution p;q 2Pn

with all values nonzero and satisfying the condition

(3.3) Rpi qi rpi for eachi_{2 f}1; :::; n_g;

whereR 1 r >0:

If we de…ne the vector

y= p1 q1

z1; :::; pn qn

zn 2Kn;

then we have the inequalities

(3.4) R[If(y;p) f(Yn)] If(z;q) f(Zn) r[If(y;p) f(Yn)] ( 0)

and the inequalities

(3.5) [If(y;p) f(Yn)]R If(z;q) f(Zn) [If(y;p) f(Yn)]r( 0)

respectively, whereZn:=Pni=1zi andYn:=Pni=1yi=Pni=1 pi

qi zi2K: The proof follows from Lemma 1 applied forM =R; m=r andxi= zi

qi where

i_{2 f}1; :::; n_g:

Corollary 3. Let f : [0;₁)_!Rbe a normalized convex function. For two prob-ability distributions p;q₂Pn _{with all values nonzero assume that there exists the}

constants R 1 r >0satisfying the condition (3.3). If s= (s1; :::; sn)₂Pn _{is such that the vector}

(3.6) y= p1

q1s1; :::; pn

qnsn 2R n +

is a probability distribution, then we have the inequalities

(3.7) RIf(y;p) If(s;q) rIf(y;p)

and the inequalities

(7)

Remark 2. It is natural to ask if we can …nd probability distributionsp;q;s2Pn

such that yde…ned by (3.6) is a probability distribution as well.

Let consider the simplest example, namely for n = 2: In this case for, say p= (0:1;0:9), q= (0:2;0:8) and s= (s1; s2) 2 P2 we have y = 1₂s1;9₈s2 which

should satisfy the condition that 1₂s1 + 9₈s2 = 1 for some s1; s2 ₂ [0;1] with

s1 +s2 = 1: We observe that this system of equations has the unique solution

s1=1₅ ands2= 4₅;showing that(s1; s2)2P2:

4. Other Bounds in Terms of Gâteau Derivatives

Assume thatf :X _!Ris aconvex function on the real linear space X. Since for any vectors x; y₂X the functiongx;y :R!R; gx;y(t) :=f(x+ty)is convex

it follows that the following limits exist

(4.1) _r+( )f(x) (y) := lim

t!0+( )

f(x+ty) f(x) t

and they are called theright(left) Gâteaux derivatives of the functionf in the point

xover the directiony:

It is obvious that for anyt >0> swe have

(4.2) f(x+ty) f(x)

t r+f(x) (y) = inft>0

f(x+ty) f(x) t

sup s<0

f(x+sy) f(x)

s =r f(x) (y)

f(x+sy) f(x) s

for anyx; y₂X and, in particular,

(4.3) _r f(u) (u v) f(u) f(v) _r+f(v) (u v)

for any u; v₂X:We call this the gradient inequality for the convex functionf: It will be used frequently in the sequel in order to obtain various results related to Jensen’s inequality.

The following properties are also of importance:

(4.4) _r+f(x) ( y) = r f(x) (y);

and

(4.5) _r+( )f(x) ( y) = r+( )f(x) (y)

for anyx; y₂X and 0:

The right Gâteaux derivative issubadditive while the left one is superadditive, i.e.,

(4.6) _r+f(x) (y+z) r+f(x) (y) +r+f(x) (z)

and

(4.7) _r f(x) (y+z) _r f(x) (y) +_r f(x) (z)

for anyx; y; z₂X .

Some natural examples can be provided by the use of normed spaces.

Assume that(X;_{k k})is a real normed linear space. The function f :X _!R,

f(x) := 1 2kxk

2

(8)

semi-inner products

hy; x_i_s(i):= lim t!0+( )

kx+ty_k2 _kx_k2

t :

For a comprehensive study of the properties of these mappings in the Geometry of Banach Spaces see the monograph [7].

For the convex functionfp:X_!R,fp(x) :=_kx_kp withp >1;we have

r+( )fp(x) (y) = 8 < :

p_kx_kp 2_hy; x_i_s(i) ifx₆= 0

0 ifx= 0

for anyy₂X:

Ifp= 1;then we have

r+( )f1(x) (y) = 8 < :

kx_k 1_hy; x_i_s(i) ifx₆= 0

+ ( )_ky_k ifx= 0

for anyy₂X:

This class of functions will be used to illustrate the inequalities obtained in the general case of convex functions de…ned on an entire linear space.

The following result holds:

Lemma 2. Let f : X _! R be a convex function. Then for any x; y ₂ X and

t₂[0;1]we have

(4.8) t(1 t) [_r f(y) (y x) _r+f(x) (y x)]

tf(x) + (1 t)f(y) f(tx+ (1 t)y)

t(1 t) [_r+f(tx+ (1 t)y) (y x) r f(tx+ (1 t)y) (y x)] 0:

Proof. Utilising the gradient inequality (4.3) we have

(4.9) f(tx+ (1 t)y) f(x) (1 t)_r+f(x) (y x)

and

(4.10) f(tx+ (1 t)y) f(y) t_r f(y) (y x):

If we multiply (4.9) withtand (4.10) with1 tand add the resultant inequalities we obtain

f(tx+ (1 t)y) tf(x) (1 t)f(y)

(1 t)t_r+f(x) (y x) t(1 t)r f(y) (y x)

which is clearly equivalent with the …rst part of (4.8). By the gradient inequality we also have

(1 t)_r f(tx+ (1 t)y) (y x) f(tx+ (1 t)y) f(x)

and

t_r+f(tx+ (1 t)y) (y x) f(tx+ (1 t)y) f(y)

(9)

Theorem 4. Letf :K_!Rbe a convex function on the coneK:Ifz= (z1; :::; zn);

v = (v1; :::; vn) 2 Kn, p= (p1; :::; pn); q= (q1; :::; qn) 2 Pn are two probability

distributions with all values nonzero and ; 0 with + = 1;then we have

n X

i=1 piqi

pi+ qi r f zi qi

zi qi

vi

pi r+f vi pi zi qi vi pi (4.11)

If(v;p) + If(z;q) If[ (v;p) + (z;q)] n

X

i=1 piqi pi+ qi

r+f

vi+ zi pi+ qi

zi qi

vi

pi r

f vi+ zi pi+ qi

zi qi

vi pi 0

Proof. If we write the inequality (4.8) for

x= vi pi; y=

zi

qi andt= pi pi+ qi

then we get

piqi

( pi+ qi)2 r f zi qi

zi qi

vi

pi r+ f vi

pi zi qi vi pi (4.12) pi pi+ qi

f vi pi

+ qi pi+ qi

f zi qi

f vi+ zi pi+ qi piqi

( pi+ qi)2

r+f vi+ zi

pi+ qi zi qi

vi

pi r

f vi+ zi pi+ qi

zi qi

vi pi 0;

for eachi_{2 f}1; :::; n_g:

Now, if we multiply (4.12) by pi+ qi>0and sum overifrom1tonwe derive the desired result (4.11).

It is natural now to consider the corresponding result for convex functions of a real variable.

Corollary 4. Let f : [0;₁) _! R be a normalized convex function. If z = (z1; :::; zn); v = (v1; :::; vn), p= (p1; :::; pn); q= (q1; :::; qn) ₂ Pn are probability distributions with all values nonzero and ; 0 with + = 1;then we have

n X

i=1

det zi vi qi pi pi+ qi f

0 zi

qi f 0 + vi pi (4.13)

If(v;p) + If(z;q) If[ (v;p) + (z;q)]

n X

i=1

0

+

vi+ zi pi+ qi f

0 vi+ zi

(10)

Remark 3. It is obvious that for di¤ erentiable convex functions on(0;₁)the lower bound vanishes and the inequality (4.13) becomes:

(4.14) 0 If(v;p) + If(z;q) If[ (v;p) + (z;q)]

n X

i=1

0 zi

qi f

0 vi pi

that can be used for particular divergence measures.

Indeed, if we consider the normalised convex functionf(t) = (t 1)2,t₂[0;₁), then

If(p;q) =D 2(p;q)

where, as in the introduction, the 2 _{distance (chi-square distance) is de…ned by}

D 2(p;q) := n X

i=1

(pi qi)2 qi :

It is clear that the inequality (4.14) becomes then

(4.15) 0 D 2(v;p) + D 2(z;q) D 2[ (v;p) + (z;q)]

2 n X

i=1

det2 zi vi qi pi piqi( pi+ qi)

:

The Kullback-Leibler distanceD(; )is de…ned by

D(p;q) := n X

i=1 pilog

pi qi :

If we choosef(t) =tlnt,t >0, then obviously

If(p;q) =D(p;q)

and the inequality (4.14) becomes then

(4.16) 0 D(v;p) + D(z;q) D[ (v;p) + (z;q)]

ln 8 < :

n Y

i=1 2 4 zipi

qivi

pizi qivi pi+ qi

3 5

9 = ;:

Similar results could be obtained for other particular instances of divergence mea-sures, however the details are omitted.

In what follows we provide some lower and upper bounds for the nonnegative di¤erenceIf(x;q) If(xJ;qJ)whereJ is a nonempty subset of_f1; :::; n_gand

If(xJ;qJ) =QJf XJ QJ

+QJf XJ QJ :

For a nonempty subsetK of_f1; :::; n_g we also use the notation

If;K(x;q) := X

i2K qif

xi qi

(11)

Theorem 5. Let f : K _! R be a convex function on the cone K: Then for any

n-tuple of vectors x= (x1; :::; xn)2Kn, a probability distributionq2Pn with all

values nonzero and for any nonempty subsetJ of _f1; :::; n_g we have

(4.17) I

r f( ) XJ_QJ ;J(x;q) +Ir f( ) XJ_QJ ;J(x;q) If(x;q) If(xJ;qJ) I

r+f XJ_QJ XJ_QJ ;J(x;q) +Ir+f XJ_QJ XJ_QJ ;J(x;q) 0

Proof. Utilising the gradient inequality we have, for a given nonempty set J of f1; :::; n_gwithJ ₆=_f1; :::; n_g; that

(4.18) _r f xi

qi xi qi

XJ QJ

f xi qi

f XJ QJ

r+f XJ

QJ xi qi

XJ QJ

for anyi₂J:If we multiply (4.18) withqi 0 and sum overi₂J;we get

(4.19) I

r f( ) XJ_QJ ;J(x;q) If;J(x;q) QJf XJ QJ I

r+f XJ_QJ XJ_QJ ;J(x;q) 0:

From the gradient inequality we also have

(4.20) _r f xj

qj xj qj

XJ

QJ f

xj qj

f XJ QJ

r+f XJ

QJ xj qj

XJ QJ

for anyj₂J :If we multiply (4.18) withqj 0and sum overj₂J ;we get

(4.21) I

r f( ) XJ_QJ ;J(x;q) If;J(x;q) QJf XJ QJ I

r+f XJ

QJ XJ QJ ;J

(x;q) 0:

Now, if we sum the inequalities (4.19) with (4.21) and take into account that

If;J(x;q) +If;J(x;q) =If(x;q)

and

QJf XJ

QJ +QJf XJ QJ

=If(xJ;qJ)

then we get the desired result (4.17).

(12)

Corollary 5. Let f : [0;₁)_!R be a normalized convex function. Then for any p;q₂Pn _and_{; 6}₌_J _f_{1; :::; n}_g _{we have}

I

f0_{( )} PJ QJ ;J

(p;q) +I

f0_{( )} 1 PJ

1 QJ ;J

(p;q)

(4.22)

If(p;q) QJf PJ QJ

(1 QJ)f 1 PJ 1 QJ I

f0

+ XJQJ QJPJ ;J

(p;q) +I f0

+ 1 PJ

1 QJ

1 PJ

1 QJ ;J

(p;q) 0:

Remark 4. If one chooses di¤ erent convex functions generating particular diver-gence measures such as the Kullback-Leibler, Je¤ reys or Hellinger diverdiver-gences, that one can obtain some particular results of interest. However the details are not presented here.

References

[1] R. BERAN, Minimum Hellinger distance estimates for parametric models,Ann. Statist., 5

(1977), 445-463.

[2] A. BHATTACHARYYA, On a measure of divergence between two statistical populations de…ned by their probability distributions,Bull. Calcutta Math. Soc.,35(1943), 99-109. [3] I. CSISZÁR, Information measures: A critical survey,Trans. 7th Prague Conf. on Info. Th.,

Statist. Decis. Funct., Random Processes and 8th European Meeting of Statist.,Volume B, Academia Prague, 1978, 73-86.

[4] I. CSISZÁR, Information-type measures of di¤erence of probability functions and indirect observations,Studia Sci. Math. Hungar,2(1967), 299-318.

[5] I. CSISZÁR and J. KÖRNER,Information Theory: Coding Theorem for Discrete Memory-less Systems,Academic Press, New York, 1981.

[6] D. DACUNHA-CASTELLE, Ecole d’été de Probabilités de Saint-Fleour, III-1997, Berlin, Heidelberg: Springer 1978.

[7] S.S. DRAGOMIR,Semi-inner Products and Applications, Nova Science Publishers Inc., NY, 2004.

[8] S.S. DRAGOMIR, A new re…nement of Jensen’s inequality in linear spaces with ap-plications, Preprint RGMIA Res. Rep. Coll. 12(2009), Supplement, Article 6. [Online http://www.staff.vu.edu.au/RGMIA/v12(E).asp].

[9] S.S. DRAGOMIR, Inequalities for superadditive functionals with applications,Bull. Austral. Math Soc.77(2008), 401-411.

[10] S.S. DRAGOMIR, J. PE µCARI´C and L.E. PERSSON, Properties of some functionals related to Jensen’s inequality,Acta Math. Hungarica,70(1996), 129-143.

[11] H. JEFFREYS, An invariant form for the prior probability in estimation problems, Proc. Roy. Soc. London,Ser. A,186(1946), 453-461.

[12] J.N. KAPUR, A comparative assessment of various measures of directed divergence,Advances in Management Studies,3 (1984), No. 1, 1-16.

[13] S. KULLBACK,Information Theory and Statistics,J. Wiley, New York, 1959.

[14] S. KULLBACK and R.A. LEIBLER, On information and su¢ ciency,Annals Math. Statist.,

22(1951), 79-86.

[15] F. LIESE and I. VAJDA,Convex Statistical Distances,Teubner Verlag, Leipzig, 198 [16] A. RENYI, On measures of entropy and information, Proc. Fourth Berkeley Symp. Math.

Statist. Prob., Vol. 1, University of California Press, Berkeley, 1961.

[17] F. TOPSOE, Some inequalities for information divergence and related measures of discrimi-nation,Res. Rep. Coll. RGMIA,2(1) (1999), 85-98.7.

[18] I. VAJDA,Theory of Statistical Inference and Information,Kluwer, Boston, 1989.

Research Group in Mathematical Inequalities & Applications, School of Engineering & Science, Victoria University, PO Box 14428, Melbourne City, MC 8001, Australia.

E-mail address: [email protected]