Divergence measures for generalized order statistics

(1)

Divergence Measures

for

Generalized Order Statistics

Von der Fakult¨at f¨

ur Mathematik, Informatik und Naturwissenschaften

der RWTH Aachen University zur Erlangung des akademischen Grades

eines Doktors der Naturwissenschaften genehmigte Dissertation

vorgelegt von

Diplom-Mathematiker

Quan Nhon Vuong

aus Aachen

Berichter: Universit¨atsprofessor Dr. Udo Kamps

Universit¨atsprofessor Dr. Erhard Cramer

Tag der m¨

undlichen Pr¨

ufung: 12. Juli 2012

(2)

I really appreciate the opportunity to thank persons whose support in recent years was important to me.

First of all, I would like to thank my supervisor Professor Udo Kamps for giving me the opportunity to work in the interesting ﬁeld of models of ordered random variables. Professor Kamps has a way of always ﬁnding the right and cheerful words, free of any doubt, to enhance my endurance and motivation. I would not have been able to com-plete this doctoral thesis without his unwavering support.

Furthermore, I would like to thank Professor Erhard Cramer, my co-referee. The very positive experiences that I made while working on my diploma thesis, which was super-vised by him, built the foundation for my decision to return to RWTH Aachen University after graduating.

Sincere thanks are given to all of my colleagues, who made work at the Institute of Statistics an enjoyable time. Since the interesting discussions with Dr. Stefan Bedbur were very helpful during the last months, I thank him in particular for his support. Among my friends who usually have no truck with my work I have to thank Minh Tam Le and Huy Truong, who gave me worthwhile advice on linguistic issues concerning this thesis. Their helpfulness is really outstanding and very much appreciated.

Moreover, I would like to give warm thanks to my family. My parents, Thi My Dung Nguyen and Co Nguyen Vuong, have always provided a comfortable background for my personal development.

Finally, I am deeply grateful to Miriam Tamm, who is not only the most important person in my life, but also the greatest support and largest enrichment in my life.

(3)

1. Introduction

1.1. Models of Ordered Random Variables

Models of ordered random variables occur in a wide range of statistical issues dealing with ordered data sets. Different models provide a broad variety of interpretations. Kamps (1995a,b) developed a unified approach to many models of ordered random vari-ables. Kamps introduced uniform generalized order statistics and used quantile trans-formations to define generalized order statistics (GOSs). Generalized order statistics which are based on an underlying absolutely continuous distribution function can be defined by their joint densities.

Deﬁnition 1.1.1 (Generalized order statistics (GOSs))

Let 𝐹 be an absolutely continuous distribution function with corresponding density func-tion 𝑓 . For 𝑛 ∈ ℕ and a vector of positive model parameters 𝜸 = (𝛾1, . . . , 𝛾𝑛)′ _{∈ ℝ}𝑛

+,

the ordered random variables 𝑋∗1, . . . , 𝑋∗𝑛 are called generalized order statistics if they possess a joint density of the form

𝑓𝑋∗1,...,𝑋∗𝑛 𝜸 (𝑥1, . . . , 𝑥𝑛) = ( _𝑛 ∏ 𝑗=1 𝛾𝑗 ) (_𝑛₋₁ ∏ 𝑗=1 (1_{− 𝐹 (𝑥}𝑗))𝛾𝑗−𝛾𝑗+1−1𝑓 (𝑥𝑗) ) × (1 − 𝐹 (𝑥𝑛))𝛾𝑛−1𝑓 (𝑥𝑛) (1.1.1) on the cone 𝐹−1_{(0+) < 𝑥1} _{≤ . . . ≤ 𝑥} 𝑛< 𝐹−1(1).

In this work, only such generalized order statistics are discussed. The convenient form of their joint densities allows for simple computation of several divergence measures. It is useful to note the following.

Remark 1.1.2. The joint density of the ﬁrst 𝑟 ∈ {1, . . . , 𝑛} GOSs according to Deﬁni-tion 1.1.1 is given by (cf. Kamps, 1995b, p. 62)

𝑓𝑋∗1,...,𝑋∗𝑟 𝜸 (𝑥1, . . . , 𝑥𝑟) = ( _𝑟 ∏ 𝑗=1 𝛾𝑗 ) (_𝑟−1 ∏ 𝑗=1 (1− 𝐹 (𝑥𝑗))𝛾𝑗−𝛾𝑗+1−1𝑓 (𝑥𝑗) ) × (1 − 𝐹 (𝑥𝑟))𝛾𝑟−1𝑓 (𝑥𝑟) (1.1.2) on the cone 𝐹−1_{(0+) < 𝑥} 1 ≤ . . . ≤ 𝑥𝑟< 𝐹−1(1).

The particular structure in equation (1.1.2) is the same as in equation (1.1.1). This observation allows for two possible points of view on occurring marginal densities. The marginal densities of the ﬁrst 𝑟 of 𝑛 (𝑟 < 𝑛) ordered quantities in an included model

(6)

coincide with corresponding marginal densities of GOSs with appropriate model pa-rameters. An alternative approach is to model the ﬁrst 𝑟 random variables as GOSs themselves with another parametrization (e.g., setting _{˜𝑛 = 𝑟 in place of 𝑛 in (1.1.1)).}

A variety of models of ordered random variables can be described with the presentation given in (1.1.1) using this concept of GOSs. The form of the densities will be used to determine exact expressions of the divergence measures considered in this work. For a more detailed discussion about underlying models of GOSs we refer to Kamps (1995b) and his references. Here, only some important examples of models covered by GOSs will be discussed brieﬂy in the following (cf. Kamps (2006)).

In statistical modeling (ordinary) order statistics (OSs) play a prominent role. For given random variables 𝑋1, . . . , 𝑋𝑛 the corresponding quantities arranged in ascend-ing order 𝑋1:𝑛, . . . , 𝑋𝑛:𝑛 are called order statistics. Throughout this work, the original random variables, on which OSs are based, will be assumed to be independent and identically distributed (iid). Based on iid random variables 𝑋1, . . . , 𝑋𝑛, where 𝑋1 is dis-tributed according to an absolutely continuous distribution function 𝐹 , the joint density of the ﬁrst 𝑟 OSs is obtained by setting 𝛾𝑗 = 𝑛_{− 𝑗 + 1, 𝑗 = 1, . . . , 𝑟, in equation (1.1.2).} For an introduction into the topic of OSs we refer to David and Nagaraja (2003). An interesting application of OSs is obtained when modeling (𝑛_{− 𝑟 + 1)-out-of-𝑛 systems.} Such a system consists of 𝑛 identical components, which start working at the same time. The system keeps on working as long as at least 𝑛−𝑟+1 components are running. Thus, if random variables 𝑋1, . . . , 𝑋𝑛 model the components’ failure times, then the 𝑟th OS 𝑋𝑟:𝑛 represents the life length of the system.

A more realistic model in many practical situations is given, if the failure of each of the components may influence the life length of the remaining components at work. By this more flexible modeling it can be taken into consideration if the failure of a component causes damage on the remaining ones or if after a failure the remaining components are supposed to bear an increased workload. Starting with iid random variables 𝑋₁(1), . . . , 𝑋𝑛(1) distributed according to a distribution function 𝐹1 modeling the life lengths of the 𝑛 components of the system each, the first (ordinary) OS 𝑋_1:𝑛(1) describes the first failure time. Given a corresponding realization 𝑥(1)1:𝑛, the next failure time is modeled as minimum of iid random variables 𝑋1(2), . . . , 𝑋

(2)

𝑛−1 distributed accord-ing to a possibly diﬀerent distribution function 𝐹2 truncated on the left at 𝑥(1)_1:𝑛, that is, 𝑋₁(2) ∼ (𝐹2− 𝐹2(𝑥(1)1:𝑛))/(1− 𝐹2(𝑥

(1)

1:𝑛)). Proceeding in this way leads to the structure of sequential order statistics (SOSs), which allows for more ﬂexible modeling as mentioned above. The model of SOSs can be viewed as extension of the model of OSs. In this work, we restrict ourselves to the particular choice of the distribution functions

𝐹𝑗 = 1_{− (1 − 𝐹 )}𝛼𝑗, 𝑗 = 1, . . . , 𝑛, (1.1.3) with a distribution function 𝐹 and positive model parameters 𝛼1, . . . , 𝛼𝑛. Such SOSs are called sequential order statistics based on 𝐹 and 𝛼1, . . . , 𝛼𝑛 in the following. If 𝐹 is an absolutely continuous distribution function, the joint density of the ﬁrst 𝑟 ∈ {1, . . . , 𝑛} SOSs is obtained by setting 𝛾𝑗 = 𝛼𝑗(𝑛− 𝑗 + 1), 𝑗 = 1, . . . , 𝑟, in equation (1.1.2). The particular choice of the baseline distributions 𝐹1, . . . , 𝐹𝑛 given in (1.1.3) leads to the hazard function 𝛼𝑗+1𝑓 /(1_{− 𝐹 ) of each component at work after the 𝑗th failure. This} provides a simple interpretation of the parameters, since they establish a factor for the

(7)

respective failure rates. In most practical applications, especially when dealing with technical systems, it is plausible to have non-decreasing failure rates from step to step, i.e. 𝛼1 _{≤ . . . ≤ 𝛼}𝑛.

Another well-known and widely used model concerning ordered quantities besides the one of order statistics is the model of record values (RVs) (cf., e.g., Arnold et al., 1998). They describe successive largest values in a sequence of iid random variables (𝑋𝑗)𝑗∈ℕ, where each random variable is distributed according to a continuous distribu-tion funcdistribu-tion 𝐹 . With record times 𝐿(1) = 1 and 𝐿(𝑖 + 1) = min{𝑗 > 𝐿(𝑖) : 𝑋𝑗 > 𝑋𝐿(𝑖)}, 𝑖 ∈ ℕ, the RVs are given by 𝑋𝐿(𝑖), 𝑖∈ ℕ. The joint density of the ﬁrst 𝑟 record values is obtained by setting 𝛾𝑗 = 1, 𝑗 = 1, . . . , 𝑟, in equation (1.1.2).

Just as SOSs are an extension of (ordinary) OSs, Pfeifer record values (PRVs) ex-tend the model of RVs in a similar way. In the model of PRVs the respective distributions are allowed to change after each observed record. More precisely, PRVs are based on a double sequence of independent random variables (𝑋_𝑗(𝑖))𝑖,𝑗_∈ℕ with 𝑋_𝑗(𝑖) _{∼ 𝐹}𝑖 for 𝑖, 𝑗 ∈ ℕ. With inter record times Δ1 = 1 and Δ(𝑖+1) = min{𝑗 ∈ ℕ : 𝑋𝑗(𝑖+1) > 𝑋

(𝑖)

Δ𝑖}, 𝑖 ∈ ℕ, Pfeifer record values are given by 𝑋_Δ𝑖(𝑖), 𝑖_{∈ ℕ. In this work, we restrict ourselves to the} particular choice of the distribution functions (cf. (1.1.3))

𝐹𝑗 = 1− (1 − 𝐹 )𝛽𝑗_, _{𝑗 = 1, . . . , 𝑟,}

with an absolutely continuous distribution function 𝐹 and positive parameters 𝛽𝑗, 𝑗 = 1 . . . , 𝑟. The joint density of the ﬁrst 𝑟 PRVs is obtained by setting 𝛾𝑗 = 𝛽𝑗, 𝑗 = 1, . . . , 𝑟, in equation (1.1.2).

Progressively Type-II censored order statistics (PC OSs) are another kind of or-dered quantities. They originate from a quite different motivation and are also included in GOSs. Starting a life test with 𝑁 units only 𝑛 < 𝑁 failure times are designated for observation. Embedded in ordinary OSs such a scenario would be achieved if the experiment is stopped after the failure of the 𝑛th out of 𝑁 possible ones. In this case 𝑋1:𝑁, . . . , 𝑋𝑛:𝑁 would be the ordered quantities of interest, whereas the life times of censored items are only known to be larger than 𝑋𝑛:𝑁 or its realization. The described procedure is called Type-II censoring on the right. The number of failures is pre-fixed and the duration of the experiment is random (in contrast to Type-I censoring). Pro-gressive Type-II censoring is an extension of this kind of censoring in the sense that not all censored units are necessarily removed after the 𝑛th failure, but a pre-fixed number 0 _{≤ 𝑅}𝑗 ≤ 𝑁 − 𝑛 of items still at work can be removed at random after the 𝑗th failure for each 𝑗 = 1, . . . , 𝑛. The tuple

𝑹= (𝑅1, . . . , 𝑅𝑛)

is called censoring scheme. Obviously, 𝑛 is the number of observed failure times and ∑𝑛

𝑖=1𝑅𝑖 is the total number of removed objects. This yields 𝑁 = 𝑛 + ∑𝑛

𝑖=1𝑅𝑖. For insights about the topic of progressive censoring we refer to the overview article Bal-akrishnan (2007) and the monograph BalBal-akrishnan and Aggarwala (2000). The joint density of the ﬁrst 𝑟 progressively Type-II censored OSs is obtained by setting 𝛾𝑗 = 𝑛− 𝑗 + 1 + ∑𝑛𝑖=𝑗𝑅𝑖, 𝑗 = 1, . . . , 𝑟, in equation (1.1.2). Note that the ﬁrst 𝑟 PC OSs based on a censoring scheme (𝑅1, . . . , 𝑅𝑛) form a progressively Type-II censored sample of size 𝑟 from 𝑁 units with censoring scheme (𝑅1, . . . , 𝑅𝑟₋₁, 𝑁_{− 𝑟 −}∑𝑟−1_𝑗=1𝑅𝑗) (see, e.g., Balakrishnan and Aggarwala, 2000, Thm. 2.3, p. 12).

Remark 1.1.3. It can be seen from the joint density given in (1.1.1) that GOSs based on a distribution function 𝐹 and a parameter vector 𝑐_{⋅ 𝜸 ∈ ℝ}𝑛

(8)

can also be interpreted as GOSs based on 1_{− (1 − 𝐹 )}𝑐 _{and 𝜸} _{∈ ℝ}𝑛

+. Analogously, SOSs based on constant parameters 𝛼1 = . . . = 𝛼𝑛 may be viewed as OSs based on a diﬀerent distribution function, namely 1_{− (1 − 𝐹 )}𝛼1_{. Similarly, PRVs with constant parameters} 𝛽1 = . . . = 𝛽𝑛 can be interpreted as RVs based on 1− (1 − 𝐹 )𝛽1_{. Clearly, 𝛼1} _{= 1 and} 𝛽1 = 1 yield ordinary OSs and RVs based on 𝐹 , respectively.

Table 1.1 provides an overview for diﬀerent choices of 𝜸_{∈ ℝ}𝑛

+ and the corresponding models.

Model 𝛾𝑗 > 0 (1≤ 𝑗 ≤ 𝑛) Abbreviation

order statistics based on ˜

𝐹 = 1_{− (1 − 𝐹 )}𝛼0 (𝑛− 𝑗 + 1)𝛼0

𝑂𝑆𝛼0

(𝑂𝑆 = 𝑂𝑆1) sequential order statistics with

𝐹𝑗 = 1− (1 − 𝐹 )𝛼𝑗, 1≤ 𝑗 ≤ 𝑛

(𝑛− 𝑗 + 1)𝛼𝑗 𝑆𝑂𝑆𝜶

record values based on ˜

𝐹 = 1− (1 − 𝐹 )𝛽0 𝛽0

𝑅𝑉𝛽0

(𝑅𝑉 = 𝑅𝑉1) Pfeifer record values with

𝐹𝑗 = 1− (1 − 𝐹 )𝛽𝑗, 1≤ 𝑗 ≤ 𝑛

𝛽𝑗 𝑃 𝑅𝑉𝜷

progressive Type-II censored order

statistics 𝑛− 𝑗 + 1 +

∑𝑛

𝑖=𝑗𝑅𝑖 𝑃 𝐶𝑹

Table 1.1.: Models of ordered random variables included in the model of generalized order statistics by appropriate choice of 𝛾1, . . . , 𝛾𝑛(𝛼0, 𝛽0 ∈ ℝ+, 𝜶 = (𝛼1, . . . , 𝛼𝑛)′∈ ℝ𝑛+, 𝜷= (𝛽1, . . . , 𝛽𝑛)′ ∈ ℝ𝑛+, 𝑹 = (𝑅1, . . . , 𝑅𝑛)′ ∈ ℕ𝑛0)

1.2. Several Divergence Measures

Divergence measures are coefficients to quantify the dissimilarity of two probability dis-tributions. Divergence measure coefficients are 0 if and only if the distributions are the same. The higher the coefficient the ”further away from each other” the probability dis-tributions are, and the smaller the coefficient the ”closer to each other” the disdis-tributions are. Although the measures do not necessarily satisfy all metric properties, this descrip-tion conveys the idea that the divergence measures are interpreted as distances between probability distributions. In fact, most of the divergence measures considered in this work fail to satisfy the triangle inequality. Some measures are not even symmetric, but they partly satisfy the requirements of the measures in the following definition.

Deﬁnition 1.2.1

Let 𝔓 be a set. A function 𝐷 : 𝔓× 𝔓 → ℝ is called a distance (or distance measure) on 𝔓 if for all 𝑃, 𝑄∈ 𝔓, there holds:

(9)

(ii) 𝐷(𝑃, 𝑄) = 𝐷(𝑄, 𝑃 ) (symmetry). (iii) 𝐷(𝑃, 𝑃 ) = 0 (reﬂexivity).

If 𝐷 does have the ﬁrst and the third property but not the second, it is called quasi-distance (cf., e.g., Deza and Deza, 2009, pp. 3/4) or divergence instead.

The measures considered in this work do additionally fulﬁll 𝐷(𝑃, 𝑄) = 0 _{⇒ 𝑃 = 𝑄.} Hence, the main diﬀerence between a metric and the considered distance (i.e., symmetric divergence) measures is the validity of the triangle equality

𝐷(𝑃1, 𝑃2)_{≤ 𝐷(𝑃}1, 𝑄) + 𝐷(𝑄, 𝑃2), 𝑃1, 𝑃2, 𝑄_{∈ 𝔓.}

Although the origins of the considered measures are not the same, they are all denoted as divergence measures in this work. The usage of the unified term ”divergence measure” is adapted from Pardo (2006), who gives a systematic overview of divergence measures and their use in statistical inference. A wide class of divergence measures is given by the concept of Φ-divergences introduced by Csiszár (1963) and Ali and Silvey (1966). For our purposes it is not useful to consider such a wide class, since the coefficients cannot be given in a closed form for GOSs due to the flexibility of the function Φ. The same applies to the even more general class of divergences given by the (ℎ, Φ)-divergences (Menéndez et al., 1995). In the following let (𝔛, 𝔅) be a measurable space and 𝔓 =_{𝑃𝜗∣𝜗 ∈ Θ} a family of equivalent probability measures on (𝔛, 𝔅) with Θ_{∕= ∅. Further, let 𝑃}𝜗, 𝜗_{∈ Θ,} be absolutely continuous with respect to a 𝜎-finite measure 𝜇 on (𝔛, 𝔅) with 𝜇-densities of 𝑃𝜗, 𝜗_{∈ Θ,}

𝑓𝜗(𝑥) = 𝑑𝑃𝜗

𝑑𝜇 (𝑥), 𝑥∈ 𝔛.

An important and well-known divergence measure introduced by Kullback and Leibler (1951) as mean information for discrimination between two distributions (or hypotheses) is the (directed) Kullback-Leibler divergence

𝐷𝐾𝐿(𝑓𝜗1, 𝑓𝜗2) = ∫ 𝔛 𝑓𝜗1(𝑥) ln 𝑓𝜗1(𝑥) 𝑓𝜗2(𝑥)𝑑𝜇(𝑥) = 𝐸𝜗1 [ ln𝑓𝜗1(𝑋) 𝑓𝜗2(𝑋) ] . (1.2.1)

𝐷𝐾𝐿is a function with domain_{𝑓𝜗∣𝜗 ∈ Θ}×{𝑓𝜗∣𝜗 ∈ Θ} and so are the other divergences in this section. Since we assume that the 𝜇-densities 𝑓𝜗 may be identiﬁed with their cor-responding parameter 𝜗 and vice versa, throughout this work it is written 𝐷𝐾𝐿(𝜗1, 𝜗2) instead of 𝐷𝐾𝐿(𝑓𝜗1, 𝑓𝜗2) whenever it is convenient. Strictly speaking, considering the divergence measures as functions with domain _{𝑃𝜗∣𝜗 ∈ Θ} × {𝑃𝜗∣𝜗 ∈ Θ} would be the most accurate point of view.

Kullback and Leibler (1951) and Kullback (1959) introduced and studied 𝐷𝐾𝐿 as a mea-sure of information for general probability meamea-sures. Some other meamea-sures originated from information theory too, but as mentioned earlier in this section, all the coeﬃcients will be just considered as divergences where larger values indicate larger discrepancies between probability distributions.

A closely related divergence measure was introduced by Jeﬀreys (1946, 1948). It can be seen as a symmetric version of the Kullback-Leibler divergence, namely

(10)

To emphasize the symmetry property with respect to its arguments, we refer to Jeﬀreys J-divergence as J-distance in the following. Note that 𝐷𝐾𝐿 is not symmetric with respect to its arguments.

Rényi (1961) introduced generalized probability distributions and a system of postulates for an information measure. In the discrete case he achieved Rényi’s information of order 𝛼 for 𝛼∈ ℝ+∖{1}, of which a general analogue was extended by Liese and Vajda (1987) with a factor 1/𝛼 to 𝐷𝑅,𝛼(𝑓𝜗1, 𝑓𝜗2) = 1 𝛼(𝛼_{− 1)}ln ∫ 𝔛 𝑓𝜗1(𝑥)𝛼𝑓𝜗2(𝑥)1−𝛼𝑑𝜇(𝑥). (1.2.3) In this work, 𝐷𝑅,𝛼 from equation (1.2.3) will be referred to as Rényi divergence (of order 𝛼). The additional factor 1/𝛼 yields the symmetry

𝐷𝑅,𝛼(𝑓𝜗1, 𝑓𝜗2) = 𝐷𝑅,1_−𝛼(𝑓𝜗2, 𝑓𝜗1) .

Furthermore, there is a relationship to the Kullback-Leibler divergence given by the equations (see Liese and Vajda, 1987, pp. 35 ﬀ)

lim

𝛼→1𝐷𝑅,𝛼(𝑓𝜗1, 𝑓𝜗2) = 𝐷𝐾𝐿(𝑓𝜗1, 𝑓𝜗2) and lim

𝛼→0𝐷𝑅,𝛼(𝑓𝜗1, 𝑓𝜗2) = 𝐷𝐾𝐿(𝑓𝜗2, 𝑓𝜗1),

(1.2.4) where the ﬁrst equation is also valid for the original R´enyi divergence and the second is not if the factor 1/𝛼 is omitted. Another related measure traces back to Bhattacharyya (1943), who extended his divergence between two multinomial populations to continuous distributions as 𝐷𝐵(𝑓𝜗1, 𝑓𝜗2) = _{− ln} ∫ 𝔛 √ 𝑓𝜗1(𝑥)𝑓𝜗2(𝑥)𝑑𝜇(𝑥) = 1 4𝐷𝑅,12(𝑓𝜗1, 𝑓𝜗2). (1.2.5) With respect to the two concerned distributions the Bhattacharyya distance 𝐷𝐵 is symmetric and 𝛼 = 1/2 is the only special case for which 𝐷𝑅,𝛼 is symmetric. The next measure is also related to the Bhattacharyya distance and further it is the only measure considered in this work that is a metric. To emphasize this specialty among the other divergence measures, we denote it as the Hellinger distance or Hellinger metric. For (𝑚 = 2) it is given by 𝐷𝐻,2(𝑓𝜗1, 𝑓𝜗2) = (∫ 𝔛∣ √ 𝑓𝜗1(𝑥)₋√𝑓𝜗2(𝑥)_∣2𝑑𝜇(𝑥) )1 2 (1.2.6) = (2_{− 2 exp(−𝐷𝐵}(𝑓𝜗1, 𝑓𝜗2))) 1 2

It is dedicated to Hellinger (1909), since it can be deﬁned in terms of the Hellinger integral. Matusita studied properties of the Hellinger distance in several works, for example Matusita (1964) and references given there, with a main focus on statistical decisions. As a generalization for 𝑚_{≥ 1 we also denote}

𝐷𝐻,𝑚(𝑓𝜗1, 𝑓𝜗2) = (∫ 𝔛 ∣𝑓𝜗1(𝑥) 1 𝑚 − 𝑓 𝜗2(𝑥) 1 𝑚∣𝑚𝑑𝜇(𝑥) )1 𝑚 (1.2.7) as Hellinger distance. Nevertheless, when speaking of Hellinger distance, the measure 𝐷𝐻,2 will be meant in this work if nothing else is explicitly mentioned. The expression

(11)

(𝐷𝐻,𝑚(𝑓𝜗1, 𝑓𝜗2))𝑚 _{coincides with a second invariant introduced by Jeﬀreys (1946, 1948)} simultaneously with the J-distance given in (1.2.2). It is worth mentioning that the Hellinger distance is bounded (by √𝑚

2) in contrast to the other divergences.

𝐷𝐾𝐿 and 𝐷𝐽 are included in the class of Φ-divergences; 𝐷𝐵 and 𝐷𝑅,𝛼 are not, but they belong to the class of (ℎ, Φ)-divergences (see, e.g., Pardo, 2006, pp. 6/8).

All considered measures in this work are divergences. The measures of Bhattacharyya, Jeﬀreys, and R´enyi of order 1/2 are distances, and the Hellinger distance is even a metric.

1.3. Outline

In Chapter 2, the divergences from Section 1.2 are determined for models of GOSs from Section 1.1. It is a natural idea to learn more about the structure possessed by the model of GOSs by comparing diﬀerent included models using these divergences as coeﬃcients of discrepancies.

In order to obtain the explicit forms of the diﬀerent divergence measures between GOSs based on the same underlying distribution function 𝐹 , exponential families are deﬁned in Section 2.1, and explicit forms of the divergence measures for the latter are derived. The explicit forms of divergences for GOSs can be determined by exploiting the expo-nential family structure of GOSs.

In Section 2.2, the first results concerning the formulas of the divergences are stated, and the different models identified with their model parameters as representations are considered and illustrated as points in the Euclidean Space ℝ𝑟_{. At this, closest models} and spheres with respect to different divergences are discussed.

Chapter 3 is concerned with applications (to SOSs) based on the considered divergence measures.

In Section 3.1 some general properties of maximum likelihood estimators are noted, and Section 3.2 takes up the results of the previous Chapter 2 directly. Multivariate con-fidence regions for the model parameters are considered. They are given implicitly by inequalities concerning divergence measures. A comparison of these confidence sets and rectangular confidence regions is given for simulated data sets.

Section 3.3 deals with the results of Menéndez et al. (1997). They considered a diver-gence measure for 𝑡 > 2 populations in case of populations belonging to an exponential family, and derived its asymptotic distribution in order to construct statistical tests for homogeneity within the 𝑡 populations. By presenting their results a few mistakes are corrected, and a shorter notation for some equations is shown. Moreover, the results are applied to the case of sequential (𝑛− 𝑟 + 1)-out-of-𝑛 systems finding that, for the considered example, the derived asymptotic test leads to type I error rates which dis-tinctly exceed the nominal significance levels for reasonable sample sizes. That is, the asymptotic test is not applicable in many practical situations, since a small type I error rate is not assured.

In Section 3.4, a new general estimation approach using pre-information about the mag-nitude of the parameters to be estimated is introduced in the framework of exponential families ﬁrst. This somehow heuristical approach using divergence measures is designed

(12)

for small sample sizes, for which it is known that the maximum likelihood estimator per-forms poorly. Further, sequential (𝑛_{−𝑟+1)-out-of-𝑛 systems are considered as example.} The simulation study results indicate high potential especially for multiparameter cases. Finally, in Chapter 4, the contents of the work are discussed in conclusion.

(13)

2. Framework

In the ﬁrst section of this chapter, exponential families are introduced along with useful properties. For such families, exact forms of some divergence measures can be found in the literature. We refer to these results in order to obtain explicit expressions of the divergences for generalized order statistics. This is possible, since generalized order statistics form an exponential family in model parameters (see Bedbur et al., 2012). In Section 2.2, the explicit formulas are studied.

2.1. Exponential Families

Exponential families are parametric families of distributions which are characterized by the speciﬁc form of the corresponding density functions. The distributions of many important classes form exponential families, for example, normal distributions, gamma distributions, and beta distributions. Due to nice properties of exponential families it is often useful to view questions involving particular classes of distributions from the perspective of exponential families (cf, e.g., Brown, 1986). This is the case for the purpose of deriving explicit forms of the divergences considered in this work. Therefore, exponential families are introduced in the following. Regarding the respective notations in this work concerning exponential families we follow the dissertation of Bedbur (2011) closely.

Deﬁnition 2.1.1 (Exponential Family)

Let Θ ∕= ∅ be a set of parameters and 𝔓 = {𝑃𝜗 ∣ 𝜗 ∈ Θ} be a family of distributions on a measurable space (𝔛, 𝔅). If there exist an integer 𝑘 _{∈ ℕ, a 𝜎-ﬁnite measure 𝜇 on} (𝔛, 𝔅) dominating 𝔓 and functions

𝐶 : Θ→ ℝ 𝜻 = (𝜁1, . . . , 𝜁𝑘)′ : Θ→ ℝ𝑘

𝑻 = (𝑇1, . . . , 𝑇𝑘)′ : (𝔛, 𝔅)→ (ℝ𝑘_{, 𝔹}𝑘₎

ℎ : (𝔛, 𝔅)_{→ (ℝ≥0}, 𝔹_{∩ ℝ≥0}) such that the 𝜇-densities of 𝑃𝜗, 𝜗∈ Θ, are given by

𝑓𝜗(𝑥) = 𝑑𝑃𝜗 𝑑𝜇 = 𝐶(𝜗) exp ( _𝑘 ∑ 𝑗=1 𝜁𝑗(𝜗)𝑇𝑗(𝑥) ) ℎ(𝑥), 𝑥∈ 𝔛, (2.1.1)

(14)

𝐶(𝜗) is a normalizing constant, which is sometimes useful to be written in the argu-ment of the exponential function. Therefore, we additionally deﬁne the mapping

𝜅 : Θ→ ℝ

𝜗_{7→ − ln(𝐶(𝜗))} (2.1.2)

This yields another representation of equation (2.1.1) 𝑓𝜗(𝑥) = 𝑑𝑃𝜗 𝑑𝜇 = exp ( _𝑘 ∑ 𝑗=1 𝜁𝑗(𝜗)𝑇𝑗(𝑥)− 𝜅(𝜗) ) ℎ(𝑥), 𝑥∈ 𝔛,

A more convenient form, which will be of our main interest, is given in dependence on natural parameters.

Deﬁnition 2.1.2 (Natural parameter space; natural extension) Let 𝔓 be an exponential family according to Def. 2.1.1. Then

Θ∗ :=_{{𝜻 = (𝜁}1, . . . , 𝜁𝑘)′ _{∈ ℝ}𝑘_{∣0 <} ∫ 𝔛 exp { _𝑘 ∑ 𝑗=1 𝜁𝑗𝑇𝑗(𝑥) } ℎ(𝑥)𝑑𝜇(𝑥) <_∞} is called natural parameter space of 𝔓. 𝔓∗ ₌_{𝑃∗

𝜻∣𝜻 ∈ Θ∗} with 𝐶∗(𝜻) = (∫ 𝔛 exp { _𝑘 ∑ 𝑗=1 𝜁𝑗𝑇𝑗(𝑥) } ℎ(𝑥)𝑑𝜇(𝑥) )−1 , 𝑓_𝜻∗(𝑥) = 𝐶∗(𝜻) exp { _𝑘 ∑ 𝑗=1 𝜁𝑗𝑇𝑗(𝑥) } ℎ(𝑥), 𝑥_{∈ 𝔛, and} 𝑃𝜻∗ = 𝑓𝜻∗𝜇,

is the natural extension of 𝔓.

Throughout this work let the general assumption be true that the considered expo-nential families have open natural parameter spaces. The value 𝐶∗(𝜻) is a normalizing constant. As stated in (2.1.2), we additionally deﬁne a transformed mapping

𝜅∗ : Θ∗ _{→ ℝ}

𝜻 _{7→ − ln(𝐶}∗(𝜻)). (2.1.3)

The mappings 𝜅 and 𝜅∗ are closely related. In fact, it is 𝜅∗(𝜻(𝜗)) = 𝜅(𝜗), 𝜗 ∈ Θ. In the following subsection, both notations are distinguished strictly, in order to clarify the diﬀerence between both cases for computation of the divergences. Afterwards, for reasons of simplicity, only one notation (without ”∗_{”) will be used if the meaning is clear} in context (natural parameter space, or not). In case of GOSs, the natural parameter representation comes out directly.

An exponential family is called strictly 𝑘-parametrical if the integer 𝑘 is minimal for such a representation of the densities, in the sense that there is no such representation with a smaller number of statistics. It is useful to have a characterization of this property. The following theorem (see, e.g., Witting, 1985, Thm. 1.153, p. 145) contains one using aﬃne independence.

(15)

Deﬁnition 2.1.3 (Aﬃne independence) Let 𝑘 _{∈ ℕ.}

(i) Let 𝜁1, . . . , 𝜁𝑘 be real valued functions with domain Θ ∕= ∅. 𝜁1, . . . , 𝜁𝑘 are called aﬃnely independent if for 𝑎0, 𝑎1, . . . , 𝑎𝑘, 𝑏_{∈ ℝ it holds:}

𝑘 ∑

𝑗=1

𝑎𝑗𝜁𝑗(𝜗) = 𝑏 ∀ 𝜗 ∈ Θ ⇒ 0 = 𝑎1 = . . . = 𝑎𝑘= 𝑏.

(ii) Let (𝔛, 𝔅) be a measurable space and 𝑇1, . . . , 𝑇𝑘 be real valued 𝔅_{− 𝔹−measurable} functions on (𝔛, 𝔅). 𝑇1, . . . , 𝑇𝑘 are called 𝑃 -aﬃnely independent for a probability measure 𝑃 on (𝔛, 𝔅) if for 𝑎1, . . . , 𝑎𝑘, 𝑏_{∈ ℝ it holds:}

𝑘 ∑

𝑗=1

𝑎𝑗𝑇𝑗 = 𝑏 𝑃 _{− 𝑎.𝑠.} _⇒ 0 = 𝑎1 = . . . = 𝑎𝑘 = 𝑏.

For a set 𝔓 of probability measures, 𝑇1, . . . , 𝑇𝑘 are called 𝔓-aﬃnely independent, if 𝑇1, . . . , 𝑇𝑘 are aﬃnely independent for every 𝑃 ∈ 𝔓.

Theorem 2.1.4

Let 𝔓 be an exponential family in 𝜁1, . . . , 𝜁𝑘and 𝑇1, . . . , 𝑇𝑘according to Def. 2.1.1. Then, it is

(i) 𝔓 is strictly 𝑘-parametrical if and only if the 𝜇-densities have a representation according to (2.1.1) with aﬃnely independent functions 𝜁1, . . . , 𝜁𝑘 and 𝔓-aﬃnely independent statistics 𝑇1, . . . , 𝑇𝑘.

(ii) 𝑇1, . . . , 𝑇𝑘 are 𝔓-aﬃnely independent statistics if and only if there exists 𝜗 _{∈ Θ} such that 𝐶𝑜𝑣𝜗(𝑻 ) > 0 (i.e., 𝐶𝑜𝑣𝜗(𝑻 ) is positive deﬁnite).

The following theorem demonstrates some useful properties of exponential families (see, e.g., Witting, 1985, Thm. 1.164, pp. 152/153 and Thm. 1.170, p. 157).

Theorem 2.1.5

Let 𝔓 be a 𝑘-parametrical exponential family in 𝜻 and 𝑇 with natural parameter space Θ∗ _{and 𝜇-densities of the form}

𝑓_𝜻∗(𝑥) = 𝑑𝑃 ∗ 𝜻 𝑑𝜇 = 𝐶 ∗_{(𝜻) exp} ( _𝑘 ∑ 𝑗=1 𝜁𝑗𝑇𝑗(𝑥) ) ℎ(𝑥) = exp ( _𝑘 ∑ 𝑗=1 𝜁𝑗𝑇𝑗(𝑥)− 𝜅∗_(𝜻) ) ℎ(𝑥), 𝑥∈ 𝔛. (2.1.4)

(16)

(i) The statistic 𝑻 = (𝑇1, . . . , 𝑇𝑘)′ _{has ﬁnite moments of any order. The functions} 𝜻 _{7→ 𝐸}𝜻𝑇₁𝑙1. . . 𝑇_𝑘𝑙𝑘, 𝜅∗ and 𝐶∗ are arbitrarily often diﬀerentiable in 𝜻, and it is

𝐸𝜁𝑻 =∇𝜅∗_{(𝜻) =}_{−∇ ln 𝐶}∗_(𝜻), _(2.1.5) 𝐶𝑜𝑣𝜁𝑻 = 𝐻𝜅∗(𝜻) =_−𝐻ln 𝐶∗(𝜻), 𝐸𝜁𝑇₁𝑙1. . . 𝑇_𝑘𝑙𝑘 = 𝐶∗(𝜁)_∇𝑙1₁ . . ._∇𝑙𝑘_𝑘 ∫ 𝔛 exp ( _𝑘 ∑ 𝑗=1 𝜁𝑗𝑇𝑗(𝑥) ) ℎ(𝑥)𝑑𝜇(𝑥), where ∇𝜅∗ _{= (} ∂ ∂𝜁1𝜅∗, . . . , ∂ ∂𝜁𝑘𝜅∗)′ and 𝐻𝜅∗ = ( ∂2

∂𝜁𝑖∂𝜁𝑗𝜅∗)1≤𝑖,𝑗≤𝑘 denote the gradient

and the Hessian of 𝜅∗_{, respectively.}

(ii) If 𝔓 is strictly 𝑘-parametrical, 𝐶𝑜𝑣𝜻𝑻 is positive deﬁnite.

(iii) The logarithmic derivative of the likelihood and the Fisher information matrix are given by

𝑼𝜻(𝑥) :=∇𝜻(ln 𝑓𝜻)(𝑥) = 𝑻 (𝑥)− 𝐸𝜻𝑻, I_𝑓(𝜻) :=𝐸𝜻(𝑼𝜻(𝑋)𝑼𝜻(𝑋)′) = 𝐶𝑜𝑣𝜻(𝑻 ).

In particular, the single entries of the Fisher information matrix can be expressed in terms of partial derivatives of 𝜅∗.

Corollary 2.1.6

Given the situation of Theorem 2.1.5 for the (𝑖, 𝑗)-element of I𝑓(𝜻), 1 ≤ 𝑖, 𝑗 ≤ 𝑘, we have

I𝑓(𝜻)𝑖𝑗 =

∂2_𝜅∗_(𝜻)

∂𝜻_𝑖∂𝜻_𝑗 . (2.1.6)

In case of a natural parameter space Θ∗_{, we further deﬁne the mapping} 𝜋 : Θ∗ → ℝ+

𝜻 _{7→ 𝐸}𝜻(𝑻 ) (2.1.5)

= _∇𝜅∗(𝜻) (2.1.7)

for shorter notations in the following.

2.1.1. Explicit Forms of Divergences

Upon introducing the basic notations for exponential families, we derive explicit forms of divergences in the following.

We begin with the J-distance. Jeﬀreys (1946, 1948) obtained exact forms of his two invariants for particular distributions, whereas Huzurbazar (1955) found that for

(17)

distributions admitting suﬃcient statistics the exact forms of (𝐷𝐻,𝑚)𝑚_{(𝑚 even) and 𝐷𝐽} are explicit functions of the parameters of the distributions. Huzurbazar considered the most general form of distributions admitting suﬃcient statistics as given by Koopman (1936), who was one of the creators of the concept of exponential families. In fact, the probability densities examined by Huzurbazar are of the form (2.1.1) for one variate, but he also described that for multivariate distributions this variate just has to be replaced by a set of variates. In the following, we will reproduce the calculations of Huzurbazar using the notations in this work and general 𝜇-densities. Therefore, let

𝑓𝜗(𝑥) = 𝑑𝑃𝜗 𝑑𝜇 = exp ( _𝑘 ∑ 𝑗=1 𝜁𝑗(𝜗)𝑇𝑗(𝑥)_{− 𝜅(𝜗)} ) ℎ(𝑥), 𝑥_{∈ 𝔛, and} 𝑓_𝜗_˜(𝑥) = 𝑑𝑃𝜗˜ 𝑑𝜇 = exp ( _𝑘 ∑ 𝑗=1 𝜁𝑗(˜𝜗)𝑇𝑗(𝑥)− 𝜅(˜𝜗) ) ℎ(𝑥), 𝑥_{∈ 𝔛,}

be density functions belonging to probability distributions 𝑃𝜗, 𝑃𝜗˜ ∈ 𝔓, respectively, where 𝔓 is an exponential family according to Deﬁnition 2.1.1. We begin with the computation of 𝐷𝐽: 𝐷𝐽(𝜗, ˜𝜗) = 𝐷𝐾𝐿(𝜗, ˜𝜗) + 𝐷𝐾𝐿(˜𝜗, 𝜗) = ∫ 𝔛 (𝑓𝜗(𝑥)_{− 𝑓˜}_𝜗(𝑥))(ln(𝑓𝜗(𝑥))_{− ln(𝑓˜}_𝜗(𝑥)))𝑑𝜇(𝑥) = ∫ 𝔛 (𝑓𝜗(𝑥)− 𝑓˜𝜗(𝑥)) 𝑘 ∑ 𝑗=1 ( 𝜁𝑗(𝜗)𝑇𝑗(𝑥)− 𝜅(𝜗) − 𝜁𝑗(˜𝜗)𝑇𝑗(𝑥) + 𝜅(˜𝜗))𝑑𝜇(𝑥) = 𝑘 ∑ 𝑗=1 [𝜁𝑗(𝜗)_{− 𝜁}𝑗(˜𝜗)]_⋅ ∫ 𝔛 (𝑓𝜗(𝑥)_{− 𝑓˜}_𝜗(𝑥))𝑇𝑗(𝑥)𝑑𝜇(𝑥) = 𝑘 ∑ 𝑗=1 [𝜁𝑗(𝜗)− 𝜁𝑗(˜𝜗)]⋅ [𝐸𝜗(𝑇𝑗)− 𝐸˜𝜗(𝑇𝑗)] = (𝜻(𝜗)_{− 𝜻(˜}𝜗))′(𝐸𝜗(𝑻 )_{− 𝐸˜}_𝜗(𝑻 )).

Huzurbazar (1955) explained how to express [𝐸𝜗(𝑇𝑗)− 𝐸˜𝜗(𝑇𝑗)] in terms of 𝜗 in general, but we are only interested in the case of a natural parameter space, where 𝐸𝜗(𝑻 ) is obtainable from (2.1.5).

As an intermediate result, we compute an explicit expression for

𝜆𝑝,𝑞(𝑓𝜗, 𝑓𝜗˜) = ∫

𝔛

𝑓𝜗(𝑥)𝑝𝑓𝜗˜(𝑥)

𝑞_{𝑑𝜇(𝑥).}

Huzurbazar calculated this coefficient for 𝑝 = 𝑚_𝑚−𝑗 and 𝑞 = _𝑚𝑗, 𝑗 = 1, . . . , 𝑚, but here any 𝑝, 𝑞 _{∈ [0, 1] with 𝑝 + 𝑞 = 1 are allowed. Thus, 𝜆}𝑝,𝑞 can be used for the exact form of Rényi divergence 𝐷𝑅,𝑝 for any 𝑝_{∈ (0, 1). The steps of calculation are not affected by}

(18)

this. We ﬁnd 𝜆𝑝,𝑞(𝑓𝜗, 𝑓𝜗˜) = ∫ 𝔛 𝑓𝜗(𝑥)𝑝𝑓𝜗˜(𝑥) 𝑞_{𝑑𝜇(𝑥)} = exp(−𝑝𝜅(𝜗) − 𝑞𝜅(˜𝜗)) ∫ 𝔛 exp ( _𝑘 ∑ 𝑗=1 (𝑝𝜁𝑗(𝜗) + 𝑞𝜁𝑗(˜𝜗))𝑇𝑗(𝑥) ) ℎ(𝑥)𝑑𝜇(𝑥) = ∫ 𝔛 exp ( _𝑘 ∑ 𝑗=1 ( (𝑝𝜁𝑗(𝜗) + 𝑞𝜁𝑗(˜𝜗))𝑇𝑗(𝑥))−(𝜅∗(𝑝𝜻(𝜗) + 𝑞𝜻(˜𝜗))) ) ℎ(𝑥)𝑑𝜇(𝑥) × exp(𝜅∗(𝑝𝜻(𝜗) + 𝑞𝜻(˜𝜗)))exp(_{−𝑝𝜅(𝜗) − 𝑞𝜅(˜}𝜗)) (∗) = exp[(𝜅∗(𝑝𝜻(𝜗) + 𝑞𝜻(˜𝜗)))₋(𝑝𝜅(𝜗) + 𝑞𝜅(˜𝜗))],

where in (∗) it used that 𝑝𝜻(𝜗) + 𝑞𝜻(˜𝜗) ∈ Θ∗_{, if 𝜻(𝜗), 𝜻(˜}_𝜗) _{∈ Θ}∗_{, because the natural} parameter space Θ∗ is convex (see, e.g., Lehmann and Romano, 2005, La. 2.7.1, p. 48). Now, for 𝑚∈ 2ℕ, we can state

( 𝐷𝐻,𝑚(𝜗, ˜𝜗))𝑚 = ∫ 𝔛 ∣𝑓𝜗(𝑥)𝑚1 − 𝑓˜ 𝜗(𝑥) 1 𝑚∣𝑚𝑑𝜇(𝑥) = ∫ 𝔛 (𝑓𝜗(𝑥) 1 𝑚 − 𝑓˜ 𝜗(𝑥) 1 𝑚)𝑚𝑑𝜇(𝑥) (since 𝑚 is even) = ∫ 𝔛 𝑚 ∑ 𝑗=0 (−1)𝑗 ( 𝑚 𝑗 ) 𝑓𝜗(𝑥)𝑚𝑗𝑓 ˜ 𝜗(𝑥) 𝑚−𝑗 𝑚 𝑑𝜇(𝑥) = 𝑚 ∑ 𝑗=0 (₋₁₎𝑗 ( 𝑚 𝑗 ) ∫ 𝔛 𝑓𝜗(𝑥)𝑚𝑗𝑓 ˜ 𝜗(𝑥) 𝑚−𝑗 𝑚 𝑑𝜇(𝑥) = 𝑚 ∑ 𝑗=0 (₋₁₎𝑗 ( 𝑚 𝑗 ) 𝜆𝑗 𝑚, 𝑚−𝑗 𝑚 (𝑓𝜗, 𝑓𝜗˜),

and this yields formulas for the Hellinger distance for 𝑚 ∈ 2ℕ in case of the two compared distributions belonging to the same exponential family

𝐷𝐻,𝑚(𝜗, ˜𝜗) = ( _𝑚 ∑ 𝑗=0 (−1)𝑗 ( 𝑚 𝑗 ) 𝜆𝑗 𝑚, 𝑚−𝑗 𝑚 (𝑓𝜗, 𝑓𝜗˜) )1 𝑚 , and as important special case we ﬁnd

𝐷𝐻,2(𝜗, ˜𝜗) = ( 2_{− 2𝜆}1 2,12(𝑓𝜗, 𝑓𝜗˜) )1 2 = ( 2_{− 2 exp} [( 𝜅∗(1 2(𝜻(𝜗) + 𝜻(˜𝜗))) ) − ( 1 2(𝜅(𝜗) + 𝜅(˜𝜗)) )])1 2 . (2.1.8)

Noticing the equality

𝐷𝑅,𝛼(𝜗, ˜𝜗) = 1 𝛼(𝛼− 1)ln ( 𝜆𝛼,1−𝛼(𝑓𝜗, 𝑓𝜗˜) ) ,

(19)

the R´enyi divergence for an exponential family is also obtainable directly from the results of Huzurbazar (1955). Applying the equation in (1.2.4) delivers the missing Kullback-Leibler divergence. The missing divergence can also be calculated directly through straightforward computation with the exponential family structure (see Kullback, 1959, Corollary 3.2, p. 45). The steps of calculation are:

𝐷𝐾𝐿(𝜗, ˜𝜗) = ∫ 𝔛 𝑓𝜗(𝑥) ln𝑓𝜗(𝑥) 𝑓𝜗˜(𝑥) 𝑑𝜇(𝑥) = ∫ 𝔛 𝑓𝜗(𝑥) ( _𝑘 ∑ 𝑗=1 (𝜁𝑗(𝜗)_{− 𝜁}𝑗(˜𝜗))𝑇𝑗(𝑥)− (𝜅(𝜗) − 𝜅(˜𝜗)) ) 𝑑𝜇(𝑥) =_{−(𝜅(𝜗) − 𝜅(˜}𝜗)) + ( _𝑘 ∑ 𝑗=1 (𝜁𝑗(𝜗)− 𝜁𝑗(˜𝜗)) ∫ 𝔛 𝑓𝜗(𝑥)𝑇𝑗(𝑥)𝑑𝜇(𝑥) ) = 𝜅(˜𝜗)− 𝜅(𝜗) + ( _𝑘 ∑ 𝑗=1 (𝜁𝑗(𝜗)− 𝜁𝑗(˜𝜗))𝐸𝜗(𝑇𝑗)) ) = 𝜅(˜𝜗)_{− 𝜅(𝜗) +}((𝜻(𝜗)_{− 𝜻(˜}𝜗))′𝐸𝜗(𝑻 )).

Clearly, 𝐷𝐾𝐿(𝜗, ˜𝜗)+𝐷𝐾𝐿(˜𝜗, 𝜗) = 𝐷𝐽(𝜗, ˜𝜗) provides a further way to obtain the J-distance. Summarizing the previous results, we obtain the following:

Lemma 2.1.7 (Divergences in an exponential family)

Let 𝑓𝜗, 𝑓𝜗˜ be 𝜇-densities from an exponential family according to Def. 2.1.1. Then fol-lowing formulas are true.

Kullback-Leibler divergence 𝐷𝐾𝐿(𝜗, ˜𝜗) = 𝜅(˜𝜗)− 𝜅(𝜗) + ( (𝜻(𝜗)_{− 𝜻(˜}𝜗))′𝐸𝜗(𝑻 ) ) Jeﬀreys J-distance 𝐷𝐽(𝜗, ˜𝜗) = (𝜻(𝜗)_{− 𝜻(˜}𝜗))′(𝐸𝜗(𝑻 )_{− 𝐸˜}_𝜗(𝑻 )) R´enyi divergence for 𝛼∈ (0, 1)

𝐷𝑅,𝛼(𝜗, ˜𝜗) = 1 𝛼(𝛼− 1) [( 𝜅∗(𝛼𝜻(𝜗) + (1_{− 𝛼)𝜻(˜}𝜗)))₋(𝛼𝜅(𝜗) + (1_{− 𝛼)𝜅(˜}𝜗))] Bhattacharyya distance 𝐷𝐵(𝜗, ˜𝜗) =− [( 𝜅∗(1 2(𝜻(𝜗) + 𝜻(˜𝜗))) ) − ( 1 2(𝜅(𝜗) + 𝜅(˜𝜗)) )]

Hellinger distance for 𝑚_{∈ 2ℕ}

𝐷𝐻,𝑚(𝜗, ˜𝜗) = ( _𝑚 ∑ 𝑗=0 (₋₁₎𝑗 ( 𝑚 𝑗 ) 𝜆𝑗 𝑚, 𝑚−𝑗 𝑚 (𝑓𝜗, 𝑓𝜗˜) )1 𝑚

(20)

In particular: Hellinger distance for 𝑚 = 2 𝐷𝐻,2(𝜗, ˜𝜗) = ( 2− 2 exp [( 𝜅∗(1 2(𝜻(𝜗) + 𝜻(˜𝜗))) ) − ( 1 2(𝜅(𝜗) + 𝜅(˜𝜗)) )])1 2

In case of natural parameters, the expressions are simpliﬁed.

Corollary 2.1.8 (Divergences in an exponential family with natural parameter space)

Let 𝑓𝜻, 𝑓˜𝜻 be 𝜇-densities of the form (2.1.4) from an exponential family with natural parameter space according to Def. 2.1.1 and Def. 2.1.2. Then following formulas are valid (for 𝜋 see (2.1.7)).

Kullback-Leibler divergence

𝐷𝐾𝐿(𝜻, ˜𝜻) = 𝜅∗(˜𝜻)− 𝜅∗_{(𝜻) +}(_(𝜻_{− ˜𝜻)}′_𝜋(𝜻)) Jeﬀreys J-distance

𝐷𝐽(𝜻, ˜𝜻) = (𝜻_{− ˜𝜻)}′(𝜋(𝜻)_{− 𝜋(˜𝜻)}) R´enyi divergence for 𝛼_{∈ (0, 1)}

𝐷𝑅,𝛼(𝜻, ˜𝜻) = 1 𝛼(𝛼− 1) [ 𝜅∗(𝛼𝜻 + (1− 𝛼)˜𝜻) −(𝛼𝜅∗(𝜻) + (1− 𝛼)𝜅∗(˜𝜻))] Bhattacharyya distance 𝐷𝐵(𝜻, ˜𝜻) =− [ 𝜅∗(1 2(𝜻 + ˜𝜻))− ( 1 2(𝜅 ∗_{(𝜻) + 𝜅}∗_(˜_𝜻)) )]

Hellinger distance for 𝑚∈ 2ℕ

𝐷𝐻,𝑚(𝜻, ˜𝜻) = ( _𝑚 ∑ 𝑗=0 (−1)𝑗 ( 𝑚 𝑗 ) 𝜆𝑗 𝑚, 𝑚−𝑗 𝑚 (𝑓𝜻, 𝑓˜𝜻) )1 𝑚

In particular: Hellinger distance for 𝑚 = 2 𝐷𝐻,2(𝜻, ˜𝜻) = ( 2− 2 exp [( 𝜅∗(1 2(𝜻 + ˜𝜻)) ) − ( 1 2(𝜅 ∗_{(𝜻) + 𝜅}∗_(˜_𝜻₎₎ )])1 2

Proof. The results are obtained immediately by substituting the respective parameters in Lemma 2.1.7: 𝜻(𝜗) = 𝜻, 𝜻(˜𝜗) = ˜𝜻, and 𝜅∗ _{instead of 𝜅, where for the equations of} Kullback-Leibler divergence and J-distance (2.1.7) is used.

(21)

Liese and Vajda (1987) stated these results for R´enyi divergence and Kullback-Leibler divergence. By Corollary 2.1.8, for natural parameters, all considered divergences in this work can be obtained directly by plugging in the mappings 𝜅∗ _{and 𝜋, where 𝜋 is} obtained immediately from the partial derivatives of 𝜅∗ _{(see (2.1.7)). Hence, the latter} is the function of interest.

2.1.2. GOSs and Exponential Families in Model Parameters

In this subsection, the mappings 𝜅 and 𝜋 are derived for densities of generalized order statistics. Beginning with densities of the form given in (1.1.1) and setting 𝐹 (𝑥0) := 0 in equation (∗), we conclude 𝑓𝑋∗1,...,𝑋∗𝑛 𝜸 (𝑥1, . . . , 𝑥𝑛) = ( _𝑛 ∏ 𝑗=1 𝛾𝑗 ) (_𝑛₋₁ ∏ 𝑗=1 (1_{− 𝐹 (𝑥}𝑗))𝛾𝑗−𝛾𝑗+1−1𝑓 (𝑥𝑗) ) × (1 − 𝐹 (𝑥𝑛))𝛾𝑛−1𝑓 (𝑥𝑛) = ( _𝑛 ∏ 𝑗=1 𝛾𝑗 ) ( _𝑛 ∏ 𝑗=1 (1− 𝐹 (𝑥𝑗))𝛾𝑗 ) ( _𝑛 ∏ 𝑗=2 (1− 𝐹 (𝑥𝑗−1))−𝛾𝑗 ) × ( _𝑛 ∏ 𝑗=1 (1_{− 𝐹 (𝑥}𝑗))−1𝑓 (𝑥𝑗) ) = ( _𝑛 ∏ 𝑗=1 𝛾𝑗 ) exp ( _𝑛 ∑ 𝑗=1 𝛾𝑗ln (1_{− 𝐹 (𝑥}𝑗))₋ 𝑛 ∑ 𝑗=2 𝛾𝑗ln (1_{− 𝐹 (𝑥}𝑗−1)) ) × ( _𝑛 ∏ 𝑗=1 (1_{− 𝐹 (𝑥𝑗}))−1𝑓 (𝑥𝑗) ) (∗) = ( _𝑛 ∏ 𝑗=1 𝛾𝑗 ) | {z } 𝐶(𝜸) exp ( _𝑛 ∑ 𝑗=1 𝛾𝑗ln( 1 − 𝐹(𝑥𝑗) 1_{− 𝐹 (𝑥}𝑗−1) ) | {z } 𝑇𝑗(𝒙) ) ( _𝑛 ∏ 𝑗=1 𝑓 (𝑥𝑗) 1_{− 𝐹 (𝑥}𝑗) ) | {z } ℎ(𝒙) = exp ( _𝑛 ∑ 𝑗=1 𝛾𝑗ln( 1 − 𝐹(𝑥𝑗) 1_{− 𝐹 (𝑥}𝑗−1) ) − 𝑛 ∑ 𝑗=1 − ln 𝛾𝑗 | {z } 𝜅(𝜸) ) ( _𝑛 ∏ 𝑗=1 𝑓 (𝑥𝑗) 1_{− 𝐹 (𝑥}𝑗) )

on the cone 𝐹−1_{(0+) < 𝑥1} _{≤ . . . ≤ 𝑥𝑛}_{< 𝐹}−1_{(1). The exponential family structure with} natural parameter 𝜸 = (𝛾1, . . . , 𝛾𝑛)′ is obvious. As previously mentioned, the natural parameter space notation of ”∗” will not be used in the following.

The first 𝑟 ∈ {1, . . . , 𝑛} (for fixed 𝑟) GOSs are considered as model for 𝑟 ordered random quantities. By this we have a natural method for quantifying distances between different models that are included in GOSs, due to the two points of view described in Remark 1.1.2, although, in reality, marginal distributions are compared. For example, if different distributions of random variables describing the failure times of a 4-out-of-5 system are considered, it is reasonable to consider only the first two SOSs. Accordingly, the divergence numbers between two marginal densities are interpreted as divergence

(22)

numbers for the respective models. Let 𝑿 = (𝑋_∗1, . . . , 𝑋_∗𝑟)′ _{denote the vector of the} ﬁrst 𝑟 GOSs and 𝔓𝑿 ₌ _{𝑃𝑿 𝜸 = 𝑓𝜸𝑿𝜆𝑟 _ℝ𝑟 < : 𝜸 ∈ ℝ 𝑟

+} be the family of associated distributions, where ℝ𝑟

< = {𝑥 ∈ ℝ𝑟 : 𝐹−1(0) < 𝑥1 < . . . < 𝑥𝑟 < 𝐹−1(1)} denotes the cone of increasing real numbers and 𝜆𝑟

ℝ𝑟

< the 𝑟-dimensional Lebesgue measure on (ℝ𝑟_{, 𝔹}𝑟_{) restricted to that cone. The 𝜆}𝑟

ℝ𝑟 <-densities 𝑓 𝑿 𝜸 of the measures 𝑃𝜸𝑿, 𝜸 ∈ ℝ𝑟+, are given by 𝑓𝑿 𝜸 (𝒙)= exp (₍ _𝑟 ∑ 𝑗=1 𝛾𝑗𝑇𝑗(𝒙) ) − 𝜅(𝜸) ) ℎ(𝒙), 𝒙_{∈ ℝ}𝑟_<, 𝜆𝑟_ℝ𝑟 <− 𝑎.𝑒., (2.1.9) with ℎ(𝒙) =(∏𝑟_𝑗=1 ₁_{−𝐹 (𝑥𝑗)}𝑓 (𝑥𝑗) ), 𝒙 = (𝑥1, . . . , 𝑥𝑟)′ ∈ ℝ𝑟 <, 𝑇1(𝒙) = ln (1− 𝐹 (𝑥1)) , 𝒙∈ ℝ𝑟 <, 𝑇𝑗(𝒙) = ln ( 1− 𝐹 (𝑥𝑗) 1_{− 𝐹 (𝑥}𝑗−1) ) , 𝒙∈ ℝ𝑟 <, 𝑗 = 2, . . . , 𝑟, (2.1.10) and 𝜅(𝜸) =₋ 𝑟 ∑ 𝑗=1 ln 𝛾𝑗, 𝜸 = (𝛾1, . . . , 𝛾𝑟)′ ∈ ℝ𝑟+. (2.1.11) Then 𝔓𝑿 _{is an exponential family with natural parameter space ℝ}𝑟

+. For 𝜸 = (𝛾1, . . . , 𝛾𝑟)′ _{∈ ℝ}𝑟 + Theorem 2.1.5 yields 𝐸𝜸𝑻 = 𝜋(𝜸) =∇𝜅(𝜸) = (− 1 𝛾1, . . . ,− 1 𝛾𝑟) ′_, _(2.1.12) I𝑓(𝜸) = 𝐶𝑜𝑣𝜸(𝑻 ) = 𝐻𝜅(𝜸) = diag( 1 𝛾2 1 , . . . , 1 𝛾2 𝑟 ) > 0, (2.1.13) where diag(𝑎1, . . . , 𝑎𝑟) denotes the diagonal matrix in ℝ𝑟×𝑟which has elements 𝑎1, . . . , 𝑎𝑟 on the main diagonal (i.e. entry (𝑖, 𝑖) is 𝑎𝑖, 𝑖 = 1, . . . , 𝑟) and zero elsewhere. In par-ticular, from equation (2.1.13) and Theorem 2.1.4 it can be seen that 𝔓𝑿 _{is strictly} 𝑟-parametrical.

Moreover, we also remark the exponential structure for the special case of SOSs (i.e. 𝛾𝑗 = 𝛼𝑗(𝑛_{− 𝑗 + 1), 𝑗 = 1, . . . , 𝑟) with natural parameter 𝜶 = (𝛼}1, . . . , 𝛼𝑟), since models of SOSs play an important role in Chapter 3. The representation follows immediately from the one of GOSs

𝑓𝑋∗1,...,𝑋∗𝑟 𝜶 (𝑥1, . . . , 𝑥𝑟) = ( _𝑟 ∏ 𝑗=1 𝛼𝑗(𝑛− 𝑗 + 1) ) (_𝑟−1 ∏ 𝑗=1 (1− 𝐹 (𝑥𝑗))𝛼𝑗(𝑛−𝑗+1)−𝛼𝑗+1(𝑛−𝑗)−1𝑓 (𝑥𝑗) ) × (1 − 𝐹 (𝑥𝑟))𝛼𝑟(𝑛−𝑟+1)−1_{𝑓 (𝑥𝑟)} = ( _𝑟 ∏ 𝑗=1 𝛼𝑗 ) exp ( _𝑟 ∑ 𝑗=1 𝛼𝑗(𝑛− 𝑗 + 1) ln( 1 − 𝐹 (𝑥𝑗 ) 1− 𝐹 (𝑥𝑗−1) )) × ( 𝑛! (𝑛− 𝑟)! 𝑟 ∏ 𝑗=1 𝑓 (𝑥𝑗) 1− 𝐹 (𝑥𝑗) ) .

(23)

2.1.3. Explicit Forms of Divergences for GOSs

In the previous two subsections, fundamentals are established to obtain all considered divergences for distributions of GOSs based on the same absolutely continuous distri-bution function 𝐹 (with corresponding density 𝑓 ) and on possibly diﬀerent parameters 𝜸, 𝝉 ∈ ℝ𝑟

+. With corresponding densities 𝑓𝜸𝑿 and 𝑓𝝉𝑿 according to (2.1.9) we ﬁnd the following result.

Proposition 2.1.9 (Divergences for GOSs)

Given the unique parameters 𝜸 and 𝝉 of two joint (marginal) distributions of the ﬁrst 𝑟∈ {1, . . . , 𝑛} GOSs the divergence between them can be computed as follows.

Kullback-Leibler divergence 𝐷𝐾𝐿(𝜸, 𝝉 ) = 𝑟 ∑ 𝑗=1 ( 𝜏𝑗 𝛾𝑗 + ln ( 𝛾𝑗 𝜏𝑗 ) − 1 ) Jeﬀreys J-distance 𝐷𝐽(𝜸, 𝝉 ) = 𝑟 ∑ 𝑗=1 ( 𝜏𝑗 𝛾𝑗 + 𝛾𝑗 𝜏𝑗 − 2 ) = 𝑟 ∑ 𝑗=1 (𝜏𝑗_{− 𝛾}𝑗)2 𝜏𝑗𝛾𝑗 R´enyi divergence for 𝛼_{∈ (0, 1)}

𝐷𝑅,𝛼(𝜸, 𝝉 ) = 1 𝛼(𝛼− 1)ln ( _𝑟 ∏ 𝑗=1 𝛾𝛼 𝑗𝜏𝑗1−𝛼 𝛼𝛾𝑗 + (1− 𝛼)𝜏𝑗 ) Bhattacharyya distance 𝐷𝐵(𝜸, 𝝉 ) =_{− ln} ( _𝑟 ∏ 𝑗=1 2√𝛾𝑗𝜏𝑗 𝛾𝑗+ 𝜏𝑗 )

Hellinger distance for 𝑚∈ 2ℕ

𝐷𝐻,𝑚(𝜸, 𝝉 ) = ⎛ ⎝ 𝑚 ∑ 𝑘=0 (₋₁₎𝑘 ( 𝑚 𝑘 )_∏𝑟 𝑗=1 ⎛ ⎝ 𝛾 𝑘 𝑚 𝑗 𝜏 𝑚−𝑘 𝑚 𝑗 𝑘 𝑚𝛾𝑗+ 𝑚−𝑘 𝑚 𝜏𝑗 ⎞ ⎠ ⎞ ⎠ 1 𝑚

In particular: Hellinger distance for 𝑚 = 2 𝐷𝐻,2(𝜸, 𝝉 ) = ( 2_{− 2} ( _𝑟 ∏ 𝑗=1 2√𝛾𝑗𝜏𝑗 𝛾𝑗+ 𝜏𝑗 ))1 2

The expressions in this proposition can be viewed as respective divergences and dis-tances for uniform generalized order statistics, since the baseline distribution is not involved in the explicit forms. We will return to this property in Section 2.2.

(24)

Proof. The equations are a conclusion from Corollary 2.1.8 and the presentations of 𝜅 and 𝜋 for GOSs according to (2.1.11) and (2.1.12), respectively.

For the Kullback-Leibler divergence we have

𝐷𝐾𝐿(𝜸, 𝝉 ) = 𝜅(𝝉 )− 𝜅(𝜸) + ((𝜸 − 𝝉 )′𝜋(𝜸)) =₋ 𝑟 ∑ 𝑗=1 ln(𝜏𝑗) + 𝑟 ∑ 𝑗=1 ln(𝛾𝑗) + ( (𝜸_{− 𝝉 )}′(₋1 𝛾1, . . . ,− 1 𝛾𝑟) ′ ) = 𝑟 ∑ 𝑗=1 ln ( 𝛾𝑗 𝜏𝑗 ) − ( (𝜸 _{− 𝝉 )}′( 1 𝛾1, . . . , 1 𝛾𝑟) ′ ) = 𝑟 ∑ 𝑗=1 ( ln ( 𝛾𝑗 𝜏𝑗 ) − 1 + 𝜏𝑗 𝛾𝑗 ) = 𝑟 ∑ 𝑗=1 ( 𝜏𝑗 𝛾𝑗 + ln ( 𝛾𝑗 𝜏𝑗 ) − 1 ) . Jeﬀreys distance follows immediately by

𝐷𝐽(𝜸, 𝝉 ) = 𝐷𝐾𝐿(𝜸, 𝝉 ) + 𝐷𝐾𝐿(𝝉 , 𝜸) = 𝑟 ∑ 𝑗=1 ( 𝜏𝑗 𝛾𝑗 + 𝛾𝑗 𝜏𝑗 + ln ( 𝜏𝑗 𝛾𝑗 ) + ln ( 𝛾𝑗 𝜏𝑗 ) | {z } =0 −2 ) .

The R´enyi divergence of order 𝛼_{∈ (0, 1) in case of GOSs is} 𝐷𝑅,𝛼(𝜸, 𝝉 ) = 1 𝛼(𝛼_{− 1)}[𝜅(𝛼𝜸 + (1− 𝛼)𝝉 ) − (𝛼𝜅(𝜸) + (1 − 𝛼)𝜅(𝝉 ))] = 1 𝛼(𝛼_{− 1)} [ − 𝑟 ∑ 𝑗=1 ln(𝛼𝛾𝑗 + (1− 𝛼)𝜏𝑗) + 𝑟 ∑ 𝑗=1 (𝛼 ln(𝛾𝑗) + (1− 𝛼) ln(𝜏𝑗)) ] = 1 𝛼(𝛼_{− 1)}ln ( _𝑟 ∏ 𝑗=1 𝛾𝛼 𝑗𝜏𝑗1−𝛼 𝛼𝛾𝑗 + (1− 𝛼)𝜏𝑗 ) .

Since for 𝛼 = 1₂ = 1− 𝛼 this formula simpliﬁes, the Bhattacharyya distance is 𝐷𝐵(𝜸, 𝝉 )(1.2.5)= 1 4𝐷𝑅,12(𝜸, 𝝉 ) =− ln ( _𝑟 ∏ 𝑗=1 2√𝛾𝑗𝜏𝑗 𝛾𝑗 + 𝜏𝑗 )

and as transformation of the latter, the Hellinger distance (𝑚 = 2) is

𝐷𝐻,2(𝜸, 𝝉 ) (1.2.6) = (2_{− 2 exp(−𝐷𝐵}(𝜸, 𝝉 )))12 = ( 2_{− 2} 𝑟 ∏ 𝑗=1 2√𝛾𝑗𝜏𝑗 𝛾𝑗 + 𝜏𝑗 )1 2 .

The Hellinger distance for 𝑚 _{∈ 2ℕ is achieved with an explicit expression for 𝜆}𝑗

𝑚,

𝑚−𝑗

𝑚 .

It can be computed analogously to the computation of the product for the R´enyi diver-gence, since 𝐷𝑅,𝛼(𝜸, 𝝉 ) = _{𝛼(𝛼−1)}1 ln (𝜆𝛼,1_−𝛼(𝑓𝜸, 𝑓𝝉)) .

Note that Bhattacharyya distance and Hellinger metric are transformations of a sim-ilarity coeﬃcient which is given by the product of the componentwise ratios of the

(25)

geometric mean and the arithmetic mean. These ratios are always larger than zero and not larger than 1 (inequality of arithmetic and geometric means) and so is the product of them.

We consider a ﬁrst example for the J-distance.

Example 2.1.10. Let 𝑟 _{∈ ℕ, 𝜸 = (𝑛, 𝑛 − 1, 𝑛 − 2, . . . , 𝑛 − 𝑟 + 1)}′ _{(i.e. model parameter} for first 𝑟 OSs based on a sample of size 𝑛∈ ℕ), and 𝜏(𝑘) _{= (𝑛 + 𝑘, (𝑛 + 𝑘)}_{− 1, (𝑛 + 𝑘) −} 2, . . . , (𝑛 + 𝑘)− 𝑟 + 1)′_{, 𝑘} _{∈ ℕ (i.e. model parameter for first 𝑟 OSs based on a sample} of size 𝑛 + 𝑘∈ ℕ). Then we find 𝐷𝐽(𝜸, 𝝉(𝑘)) = 𝑟 ∑ 𝑗=1 ( 𝑛_{− 𝑗 + 1} 𝑛− 𝑗 + 1 + 𝑘 + 𝑛_{− 𝑗 + 1 + 𝑘} 𝑛− 𝑗 + 1 − 2 ) = 𝑟 ∑ 𝑗=1 ( 1− 𝑘 𝑛_{− 𝑗 + 1 + 𝑘} + 1 + 𝑘 𝑛_{− 𝑗 + 1}− 2 ) = 𝑟 ∑ 𝑗=1 ( − 𝑘 𝑛_{− 𝑗 + 1 + 𝑘} + 𝑘 𝑛_{− 𝑗 + 1} ) = 𝑘 ( _𝑛+𝑘 ∑ 𝑗=𝑛−𝑟+1+𝑘 −1 𝑗 + 𝑛 ∑ 𝑗=𝑛−𝑟+1 1 𝑗 ) = 𝑘 𝑛 ∑ 𝑗=𝑛−𝑟+1 ( 1 𝑗 − 1 𝑗 + 𝑘 ) .

This distance tends to zero as 𝑛 tends to infinity if 𝑘 and 𝑟 are fixed. It tends to infinity as 𝑘 tends to infinity with 𝑛 and 𝑟 fixed. As special case we note

𝐷𝐽(𝜸, 𝝉(1)) = 1 𝑛_{− 𝑟 + 1} − 1 𝑛 + 1 = 𝑟 (𝑛_{− 𝑟 + 1)(𝑛 + 1)}.

As mentioned before, the J-distance fails to be a metric in general because it does not satisfy the triangle equality. Even for the class of densities given by the concept of GOSs, the J-distance still does not have this metric property.

Remark 2.1.11. Let 𝜸, 𝝉 _{∈ ℝ}𝑟

+. It can be shown that for every 𝜹∈ ℝ𝑟+ with 𝛿𝑗 ∈ (min{𝛾𝑗, 𝜏𝑗}, max{𝛾𝑗, 𝜏𝑗})

the inequality

𝐷𝐽(𝜸, 𝝉 ) > 𝐷𝐽(𝜸, 𝜹) + 𝐷𝐽(𝜹, 𝝉 )

holds (see Section A.1 in the appendix). That is, the triangle inequality is not satisﬁed. Note that the condition given here is only a suﬃcient one. There are more examples for which the triangle equality is not valid.

(26)

2.2. On Structure of and Relations between Models of GOSs w.r.t.

to Divergences

In this section, some properties of chosen divergences given in Proposition 2.1.9 are dis-cussed. Throughout this work, divergences between distributions of first 𝑟_{∈ {1, . . . , 𝑛}} ordered random quantities are interpreted as divergences between the corresponding models itself, although strictly speaking, only marginal distributions are compared. The interpretation as distance between two models seems to be justified by the fact that the specific joint distributions of 𝑟 ordered random quantities are directly connected to the corresponding models. For the described purposes and a simple notation, the abbre-viations 𝑂𝑆𝛼0, 𝑆𝑂𝑆𝜶, 𝑅𝑉𝛽0, 𝑃 𝑅𝑉𝜷, and 𝑃 𝐶𝑹, which are already given in Table 1.1, are used as arguments for the divergences for a fixed 𝑟 in the following. This has to be understood in the following way as it is explained with an example of progressively Type-II censored order statistics: 𝑃 𝐶𝑹 is a short notation for the parameter vector which corresponds to a progressively Type-II censored sample of size 𝑟 from 𝑁 units with censoring scheme 𝑹 = (𝑅1, . . . , 𝑅𝑟), that is

(𝑁, 𝑁− 1 − 𝑅1, . . . , 𝑁 − 𝑟 + 1 − 𝑟−1 ∑ 𝑖=1 𝑅𝑖)′ = (𝑟 + 𝑟 ∑ 𝑖=1 𝑅𝑖, 𝑟− 1 + 𝑟 ∑ 𝑖=2 𝑅𝑖, . . . , 1 + 𝑅𝑟)′, where 𝑟 +∑𝑟_𝑖=1𝑅𝑖 = 𝑁 . Analogously, for a given 𝑛∈ ℕ, 𝑆𝑂𝑆𝜶with 𝜶 = (𝛼1, . . . , 𝛼𝑟)′ ∈ ℝ𝑟

+ denotes the parameter vector

(𝛼1𝑛, 𝛼2(𝑛− 1), . . . , 𝛼𝑟(𝑛− 𝑟 + 1))′_,

which yields the distribution of the ﬁrst 𝑟 SOSs based on 𝛼1, . . . , 𝛼𝑟. Upon introduc-ing such notations, we give an example that provides some alternative expressions for divergences and distances between models of OSs and RVs. The corresponding model parameters are only dependent on 𝑟 (RVs) or 𝑟 and 𝑛 (OSs).

Example 2.2.1. For the Kullback-Leibler divergence we ﬁnd 𝐷𝐾𝐿(𝑅𝑉, 𝑂𝑆) = 𝑟 ∑ 𝑗=1 ( 𝑛− 𝑗 + 1 + ln ( 1 𝑛_{− 𝑗 + 1} ) − 1 ) = 𝑟 2(2𝑛− 𝑟 − 1) − ln ( 𝑛! (𝑛_{− 𝑟)!} ) and 𝐷𝐾𝐿(𝑂𝑆, 𝑅𝑉 ) = 𝑟 ∑ 𝑗=1 ( 1 𝑛_{− 𝑗 + 1} + ln (𝑛− 𝑗 + 1) − 1 ) = 𝑛 ∑ 𝑗=1 ( 1 𝑗 ) − 𝑛−𝑟 ∑ 𝑗=1 ( 1 𝑗 ) + ln ( 𝑛! (𝑛− 𝑟)! ) − 𝑟 = 𝜓(𝑛 + 1)_{− 𝜓(𝑛 − 𝑟 + 1) + ln} ( 𝑛! (𝑛_{− 𝑟)!} ) − 𝑟,

(27)

where 𝛾 is the Euler constant and 𝜓 is the digamma function. The latter is given by 𝜓(𝑥) = 𝑑/𝑑𝑥 ln Γ(𝑥), and the equality 𝜓(𝑛) + 𝛾 = ∑𝑛_𝑘=1−1𝑘−1_{, 𝑛} _{∈ ℕ, holds (see, e.g.,} Abramowitz and Stegun, 1964, p. 258). Moreover, we derive the distances

𝐷𝐽(𝑂𝑆, 𝑅𝑉 ) = 𝐷𝐾𝐿(𝑂𝑆, 𝑅𝑉 ) + 𝐷𝐾𝐿(𝑅𝑉, 𝑂𝑆) = 𝜓(𝑛 + 1)_{− 𝜓(𝑛 − 𝑟 + 1) +} 𝑟 2(2𝑛− 𝑟 − 3) and 𝐷𝐻,2(𝑂𝑆, 𝑅𝑉 ) = ( 2− 2 𝑟 ∏ 𝑗=1 2√𝑛− 𝑗 + 1 𝑛_{− 𝑗 + 2} )1 2 = ( 2− 2𝑟+1 ( 𝑛! (𝑛_{− 𝑟)!} )1 2 _(𝑛− 𝑟 + 1)! (𝑛 + 1)! )1 2 = ( 2− 2𝑟+1 ( (𝑛− 𝑟)! 𝑛! )1 2 _𝑛− 𝑟 + 1 𝑛 + 1 )1 2 .

Tables C.1 to C.4 (on pp. 138 ﬀ.) in the appendix contain computed values of these derived divergences and distances between OSs and RVs. In particular, there are many values for large 𝑟 and 𝑛 of the Hellinger distance which are close to√2 (the upper bound of 𝐷𝐻,2). This may be understood as a disadvantage of the Hellinger distance, which is the only considered divergence with metric properties in this work. The very similar values of the Hellinger distance cannot reveal relative diﬀerences of distances between models as detailed as the values of unbounded measures can.

For Jeﬀreys and Hellinger distances, this example of OSs and RVs is given in a preprint of Vuong et al. (2012) along with some other special cases which yield explicit alternative expressions.

2.2.1. First Results

No dependence on baseline distribution

The ﬁrst interesting common property of the divergences between GOSs is that they are independent of the baseline cumulative distribution function 𝐹 . Meaning, the diver-gences between two models of GOSs are invariant under particular choices of a common baseline distribution function. This is due to the fact that 𝐹 is not involved in 𝜅. For example, the divergence between record values and ordinary order statistics is always the same for ﬁxed 𝑟 and 𝑛, regardless on which absolutely continuous distribution func-tion they are based if it is the same for both.

Cramer and Bagh (2011) used the Kullback-Leibler divergence as an information mea-sure (see Kullback and Leibler, 1951; Kullback, 1959) to consider optimal censoring schemes for progressively Type-II censored order statistics. They compare progressively Type-II censored OSs with an iid sample of same size. By this, they can establish optimal censoring schemes in the sense of minimum or maximum information for any continuous distribution function because the divergence is distribution free in this case,

(28)

too. Moreover, Cramer and Bagh considered an 𝐼𝛼-information, which is closely related to R´enyi divergence as it is considered in this work.

Dependence on parameter ratios

Another property directly to be seen from the formulas in Proposition 2.1.9 is that all measures may be rewritten as functions of the ratios 𝛾𝑗/𝜏𝑗 or of its reciprocals. For Kullback-Leibler divergence and Jeﬀreys J-distance this is obvious from the expressions given in Proposition 2.1.9. The equation

𝑟 ∏ 𝑗=1 𝛾_𝑗𝑝𝜏_𝑗𝑞 𝑝𝛾𝑗 + 𝑞𝜏𝑗 = 𝑟 ∏ 𝑗=1 (𝛾𝑗/𝜏𝑗)𝑝 𝑝(𝛾𝑗/𝜏𝑗) + 𝑞

for 𝑝, 𝑞 ∈ (0, 1) with 𝑝+𝑞 = 1 reveals the dependence on the parameter ratios of the other considered measures. Consequently, with respect to the 𝑗th component (𝑗 ∈ {1, . . . , 𝑟}) of two parameter vectors factors will cancel out each other for every considered di-vergence measure. An example occurs in case of two models of SOSs. For the 𝑗th component of the parameter vectors the factor (𝑛_{−𝑗 +1) is canceled, since the fractions}

˜ 𝛼𝑗(𝑛− 𝑗 + 1) 𝛼𝑗(𝑛_{− 𝑗 + 1)} = ˜ 𝛼𝑗 𝛼𝑗, 𝑗 = 1, . . . , 𝑟,

will lead to equal divergences. Note that this yields a kind of independence of the diver-gences of the number 𝑛 of total random variables, similar to the distribution freedom mentioned above. For a comparison of two models for the ﬁrst 𝑟 (≤ 𝑛) SOSs determined by their model parameter vectors 𝜶,𝜶, the crucial item to determine the divergence is˜ the vector of parameter quotients (𝛼1/𝛼˜1, . . . , 𝛼𝑟/𝛼˜𝑟)′ ∈ ℝ𝑟+.

Table 2.1 illustrates some distances between representatives of different sets of models 𝑂𝑆𝛼0, 𝑆𝑂𝑆𝜶, 𝑅𝑉𝛽0, 𝑃 𝑅𝑉𝜷, 𝑃 𝐶𝑹. Jeffreys distance and (squared) Hellinger distance for 𝑚 = 2 are specified, but since the main product (Bhattacharyya coefficient) in the Hellinger distance is also the crucial term in the Bhattacharyya distance, it can also be seen as a table for the latter. By comparing the values in Table 2.1, it is noticeable that the entries in the block of 𝑂𝑆𝛼0/𝑆𝑂𝑆𝜶 are equally structured as the ones in the block concerning 𝑅𝑉𝛽0/𝑃 𝑅𝑉𝜷. This similarity is a consequence of the exclusive dependence on the ratios and it is illustrated in Figure 2.1.

A closer look at the two well-known models OSs and RVs provides further insight to the mentioned property. These two models are also interesting for a comparison because they do not have any further parameters (cf. Example 2.2.1). Meaning, they can be viewed as 𝑆𝑂𝑆𝜶 models with ﬁxed choices of 𝜶.

Example 2.2.2 (distance scheme around record values). Let 𝑟 and 𝑛 be ﬁxed, and let 𝐷★ denote any divergence measure from Proposition 2.1.9. Then 𝑑 = 𝐷★(𝑅𝑉, 𝑂𝑆) > 0 is a ﬁxed number. Hence, a question may arise: Which choices for the parameter 𝜶 provide

𝐷★(𝑅𝑉, 𝑆𝑂𝑆𝜶) = 𝑑 ? (2.2.1)

Obviously, 𝜶 = (1, . . . , 1)′ _{=: 1}_{∈ ℝ}𝑟 _{is a possible choice, since 𝑆𝑂𝑆}

1=𝑂𝑆, but it can be seen that there are inﬁnitely many diﬀerent possible choices for 𝜶 (for 𝑟_{≥ 2). Recursive}

(29)

𝑂 𝑆𝛼 0 𝑆 𝑂 𝑆𝜶 𝑅 𝑉𝛽 0 𝑃 𝑅 𝑉𝜷 𝑃 𝐶𝑹 M o d el 𝑟 ( 𝛼 0 ˜𝛼0 + ˜𝛼0 𝛼 0 − 2 ) ∑ ( 𝛼 𝑗 ˜𝛼0 + ˜𝛼0 𝛼𝑗 − 2 ) ∑ ( 𝛽0 ˜𝛼0 𝑘𝑗 + ˜𝛼0 𝑘𝑗 𝛽0 − 2 ) ∑ ( 𝛽𝑗 ˜𝛼0 𝑘𝑗 + ˜𝛼0 𝑘𝑗 𝛽𝑗 − 2 ) ∑ ( 𝑘 𝑗 + 𝑹 𝑟 𝑗 ˜𝛼0 𝑘𝑗 + ˜𝛼0 𝑘𝑗 𝑘𝑗 + 𝑹 𝑟 𝑗 − 2 ) 𝑂 𝑆˜𝛼 0 J-distance 𝐷𝐽(⋅, ⋅) squ are dH ell in ger -d ist an ce 𝐷 𝐻 (⋅, ⋅) 2 𝑂 𝑆˜𝛼 0 2 − 2 𝑟 +1 (√ ˜𝛼0 𝛼 0 ˜𝛼0 + 𝛼 0 )𝑟 ∑ ( 𝛼 𝑗 ˜𝛼𝑗 + ˜𝛼𝑗 𝛼𝑗 − 2 ) ∑ ( 𝛽0 ˜𝛼𝑗 𝑘𝑗 + ˜𝛼𝑗 𝑘𝑗 𝛽0 − 2 ) ∑ ( 𝛽𝑗 ˜𝛼𝑗 𝑘𝑗 + ˜𝛼𝑗 𝑘𝑗 𝛽𝑗 − 2 ) ∑ ( 𝑘 𝑗 + 𝑹 𝑟 𝑗 ˜𝛼𝑗 𝑘𝑗 + ˜𝛼𝑗 𝑘𝑗 𝑘𝑗 + 𝑹 𝑟 𝑗 − 2 ) 𝑆 𝑂 𝑆˜𝜶 𝑆 𝑂 𝑆˜𝜶 2 − 2 ∏ 2 √ ˜𝛼 𝑗 𝛼 0 ˜𝛼𝑗 + 𝛼 0 2 − 2 ∏ 2 √ ˜𝛼 𝑗 𝛼𝑗 ˜𝛼𝑗 + 𝛼𝑗 𝑟 ( 𝛽 0 ˜ 𝛽0 + ˜ 𝛽0 𝛽0 − 2 ) ∑ ( 𝛽 𝑗 ˜ 𝛽0 + ˜ 𝛽0 𝛽𝑗 − 2 ) ∑ ( 𝑘 𝑗 + 𝑹 𝑟 𝑗 ˜ 𝛽0 + ˜ 𝛽0 𝑘𝑗 + 𝑹 𝑟 𝑗 − 2 ) 𝑅 𝑉˜ 𝛽 0 𝑅 𝑉˜ 𝛽 0 2 − 2 ∏ 2 √ ˜ 𝛽 0 𝛼 0 𝑘𝑗 ˜ 𝛽0 + 𝛼 0 𝑘𝑗 2 − 2 ∏ 2 √ ˜ 𝛽 0 𝛼𝑗 𝑘𝑗 ˜ 𝛽0 + 𝛼𝑗 𝑘𝑗 2 − 2 𝑟 +1 (√ ˜ 𝛽0 𝛽0 ˜ 𝛽0 + 𝛽0 )𝑟 ∑ ( 𝛽 𝑗 ˜ 𝛽𝑗 + ˜ 𝛽𝑗 𝛽𝑗 − 2 ) ∑ ( 𝑘 𝑗 + 𝑹 𝑟 𝑗 ˜ 𝛽𝑗 + ˜ 𝛽𝑗 𝑘𝑗 + 𝑹 𝑟 𝑗 − 2 ) 𝑃 𝑅 𝑉˜ 𝜷 𝑃 𝑅 𝑉˜ 𝜷 2 − 2 ∏ 2 √ ˜ 𝛽 𝑗 𝛼 0 𝑘𝑗 ˜ 𝛽𝑗 + 𝛼0 𝑘𝑗 2 − 2 ∏ 2 √ ˜ 𝛽 𝑗 𝛼𝑗 𝑘𝑗 ˜ 𝛽𝑗 + 𝛼𝑗 𝑘𝑗 2 − 2 ∏ 2 √ ˜ 𝛽 𝑗 𝛽0 ˜ 𝛽𝑗 + 𝛽0 2 − 2 ∏ 2 √ ˜ 𝛽 𝑗 𝛽𝑗 ˜ 𝛽𝑗 + 𝛽𝑗 ∑ ( 𝑘𝑗 + 𝑹 𝑟 𝑗 𝑘𝑗 + ˜ 𝑹 𝑟 𝑗 + 𝑘𝑗 + ˜ 𝑹 𝑟 𝑗 𝑘𝑗 + 𝑹 𝑟 𝑗 − 2 ) 𝑃 𝐶˜ 𝑹 𝑃 𝐶˜ 𝑹 2 − 2 ∏ 2 √ ( 𝑘𝑗 + ˜ 𝑹 𝑟)𝑗 𝛼 0 𝑘𝑗 ˜ 𝑹 𝑟+(𝑗 𝛼 0 +1 ) 𝑘𝑗 2 − 2 ∏ 2 √ ( 𝑘𝑗 + ˜ 𝑹 𝑟)𝑗 𝑘𝑗 𝛼𝑗 ˜ 𝑹 𝑟+(𝑗 𝛼𝑗 +1 ) 𝑘𝑗 2 − 2 ∏ 2 √ ( 𝑘𝑗 + ˜ 𝑹 𝑟)𝑗 𝛽0 𝑘𝑗 + ˜ 𝑹 𝑟+𝑗 𝛽0 2 − 2 ∏ 2 √ ( 𝑘𝑗 + ˜ 𝑹 𝑟)𝑗 𝛽𝑗 𝑘𝑗 + ˜ 𝑹 𝑟+𝑗 𝛽𝑗 2 − 2 ∏ 2 √ ( 𝑘𝑗 + ˜ 𝑹 𝑟)(𝑗 𝑘𝑗 + 𝑹 𝑟)𝑗 2 𝑘𝑗 + ˜ 𝑹 𝑟+𝑗 𝑹 𝑟 𝑗 M o d el 𝑂 𝑆𝛼 0 𝑆 𝑂 𝑆𝜶 𝑅 𝑉𝛽 0 𝑃 𝑅 𝑉𝜷 𝑃 𝐶𝑹 T ab le 2. 1. : J -d is ta n ce (u p p er ri gh t en tr ie s) an d sq u ar ed H el li n ge r d is ta n ce (l ow er le ft en tr ie s) fo r d iﬀ er en t ch os en m o d el s in cl u d ed in G O S s. S h or t n ot at io n s ’ ∑ 𝑎𝑗 ’ st an d s fo r th e su m ∑ 𝑟 𝑗= 1 𝑎𝑗 , an d an al og ou sl y ’ ∏ 𝑎𝑗 ’ fo r th e p ro d u ct ∏𝑟 𝑗= 1 𝑎𝑗 . F u rt h er , w e se t 𝑹 𝑟 𝑗:= ∑ 𝑟 𝑖= 𝑗 𝑅𝑖 an d 𝑘𝑗 := 𝑛 − 𝑗 + 1 fo r th is ta b le .

(30)

𝑅𝑉 𝑂𝑆 𝑆𝑂𝑆𝜶 𝑃 𝑅𝑉𝜷 𝑆𝑂𝑆𝜷 𝑃 𝑅𝑉𝜶 @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH 𝑑′ = 𝐷★(𝑃 𝑅𝑉𝜶, 𝑃 𝑅𝑉𝜷) 𝑑′′= 𝐷★(𝑅𝑉, 𝑃 𝑅𝑉𝜶) 𝑑 = 𝐷★(𝑅𝑉, 𝑃 𝑅𝑉𝜷) 𝑑′ 𝑑′′ 𝑑

Figure 2.1.: Equidistant models of (Pfeifer) record values and (sequential) order statistics with respect to divergences for arbitrary parameters 𝜶, 𝜷∈ ℝ𝑟

+ illustrated by the num-ber of connecting lines.

equations are obtainable despite the unavailability of closed form solutions of equation (2.2.1). See also Subsection 2.2.3 on this topic. For clarity, let 𝜶∗ _{denote such a solution} of equation (2.2.1). Another similar query arises: Which choices for the parameter 𝜷 yield

𝐷★(𝑅𝑉, 𝑃 𝑅𝑉𝜷) = 𝑑 ? (2.2.2)

Let 𝜷∗ denote a solution for this equation. Because of the mentioned property of direct dependence on the parameter ratios, it is immediate that

𝐷★(𝑂𝑆, 𝑆𝑂𝑆𝜷∗) = 𝑑 (follows from def. of 𝜷∗, since 𝑛− 𝑗 + 1 𝛽∗ 𝑗(𝑛− 𝑗 + 1) = 1 𝛽∗ 𝑗 ), 𝐷★(𝑃 𝑅𝑉𝜶∗, 𝑆𝑂𝑆𝜶∗) = 𝑑 (follows from def. of 𝑑, since

𝛼∗_𝑗 𝛼∗

𝑗(𝑛− 𝑗 + 1)

= 1

𝑛− 𝑗 + 1), 𝐷★(𝑃 𝑅𝑉𝜷∗, 𝑆𝑂𝑆𝜷∗) = 𝑑 (follows from def. of 𝑑, since

𝛽∗ 𝑗 𝛽∗ 𝑗(𝑛− 𝑗 + 1) = 1 𝑛_{− 𝑗 + 1}). It is worth mentioning that we have

𝐷★(𝑅𝑉, 𝑆𝑂𝑆) = 𝐷★(𝑃 𝑅𝑉𝜶, 𝑆𝑂𝑆𝜶) = 𝐷★(𝑃 𝑅𝑉𝜷, 𝑆𝑂𝑆𝜷) for all 𝜶, 𝜷_{∈ ℝ}𝑟

Divergence measures for generalized order statistics