Divergence Measures
for
Generalized Order Statistics
Von der Fakult¨at f¨
ur Mathematik, Informatik und Naturwissenschaften
der RWTH Aachen University zur Erlangung des akademischen Grades
eines Doktors der Naturwissenschaften genehmigte Dissertation
vorgelegt von
Diplom-Mathematiker
Quan Nhon Vuong
aus Aachen
Berichter: Universit¨atsprofessor Dr. Udo Kamps
Universit¨atsprofessor Dr. Erhard Cramer
Tag der m¨
undlichen Pr¨
ufung: 12. Juli 2012
I really appreciate the opportunity to thank persons whose support in recent years was important to me.
First of all, I would like to thank my supervisor Professor Udo Kamps for giving me the opportunity to work in the interesting field of models of ordered random variables. Professor Kamps has a way of always finding the right and cheerful words, free of any doubt, to enhance my endurance and motivation. I would not have been able to com-plete this doctoral thesis without his unwavering support.
Furthermore, I would like to thank Professor Erhard Cramer, my co-referee. The very positive experiences that I made while working on my diploma thesis, which was super-vised by him, built the foundation for my decision to return to RWTH Aachen University after graduating.
Sincere thanks are given to all of my colleagues, who made work at the Institute of Statistics an enjoyable time. Since the interesting discussions with Dr. Stefan Bedbur were very helpful during the last months, I thank him in particular for his support. Among my friends who usually have no truck with my work I have to thank Minh Tam Le and Huy Truong, who gave me worthwhile advice on linguistic issues concerning this thesis. Their helpfulness is really outstanding and very much appreciated.
Moreover, I would like to give warm thanks to my family. My parents, Thi My Dung Nguyen and Co Nguyen Vuong, have always provided a comfortable background for my personal development.
Finally, I am deeply grateful to Miriam Tamm, who is not only the most important person in my life, but also the greatest support and largest enrichment in my life.
Contents
1. Introduction 1
1.1. Models of Ordered Random Variables . . . 1
1.2. Several Divergence Measures . . . 4
1.3. Outline . . . 7
2. Framework 9 2.1. Exponential Families . . . 9
2.1.1. Explicit Forms of Divergences . . . 12
2.1.2. GOSs and Exponential Families in Model Parameters . . . 17
2.1.3. Explicit Forms of Divergences for GOSs . . . 19
2.2. On Structure of and Relations between Models of GOSs w.r.t. to Diver-gences . . . 22
2.2.1. First Results . . . 23
2.2.2. Closest Model of OSs to a given Model of SOSs . . . 28
2.2.3. Spheres and Balls . . . 43
3. Applications 61 3.1. Preliminary Remarks . . . 61
3.2. Confidence Sets . . . 63
3.2.1. Graphical Comparison of two-dimensional Confidence Regions for Different Sample Sizes . . . 65
3.2.2. Simulated Confidence Regions for Different Dimensions . . . 69
3.3. Homogeneity Tests . . . 73
3.3.1. Setting . . . 74
3.3.2. Asymptotic Normality of MLE . . . 75
3.3.3. Asymptotic Distribution of the Average J-distance for the MLE . 76 3.3.4. Statistical Applications . . . 85
3.3.5. Application to SOSs . . . 85
3.3.6. Comments on Original Work . . . 88
3.4. Estimator based on MLE and Pre-Information via Distance Constraints . 91 3.4.1. Equal Distance Estimation . . . 91
3.4.2. SOS (univariate case) . . . 93
3.4.3. SOSs (multivariate case) . . . 108
4. Summary 121 A. Selected Proofs 123 A.1. Concerning Chapter 2 . . . 123
A.1.1. Proof of Remark 2.1.11 . . . 123
A.1.2. Proof of Lemma 2.2.5 . . . 124
A.2. Concerning Chapter 3 . . . 127
A.2.1. Proof of Remark 3.1.3 (ii) . . . 127
A.2.2. Proof of Remark 3.4.1 . . . 128
B. Miscellaneous Theorems and Results 131 B.1. Optimization Subject to Equality Constraints . . . 131
B.2. Taylor’s Expansion and Continuous Mapping Theorem . . . 132
B.3. Asymptotic Normality of MLEs when Sampling from Associated Popula-tions . . . 133
C. Tables and Notations 137 C.1. Tables . . . 137
C.1.1. Tables for Divergences between OSs and RVs . . . 137
C.1.2. Quantile Tables concerning Section 3.2 . . . 142
C.1.3. Tables of (approximated) areas/volumes of simulated confidence regions for Subsection 3.2.2 . . . 146
C.2. Abbreviations and Notations . . . 154
C.2.1. List of Symbols . . . 154
C.2.2. Abbreviations . . . 156
1. Introduction
1.1. Models of Ordered Random Variables
Models of ordered random variables occur in a wide range of statistical issues dealing with ordered data sets. Different models provide a broad variety of interpretations. Kamps (1995a,b) developed a unified approach to many models of ordered random vari-ables. Kamps introduced uniform generalized order statistics and used quantile trans-formations to define generalized order statistics (GOSs). Generalized order statistics which are based on an underlying absolutely continuous distribution function can be defined by their joint densities.
Definition 1.1.1 (Generalized order statistics (GOSs))
Let 𝐹 be an absolutely continuous distribution function with corresponding density func-tion 𝑓 . For 𝑛 ∈ ℕ and a vector of positive model parameters 𝜸 = (𝛾1, . . . , 𝛾𝑛)′ ∈ ℝ𝑛
+,
the ordered random variables 𝑋∗1, . . . , 𝑋∗𝑛 are called generalized order statistics if they possess a joint density of the form
𝑓𝑋∗1,...,𝑋∗𝑛 𝜸 (𝑥1, . . . , 𝑥𝑛) = ( 𝑛 ∏ 𝑗=1 𝛾𝑗 ) (𝑛−1 ∏ 𝑗=1 (1− 𝐹 (𝑥𝑗))𝛾𝑗−𝛾𝑗+1−1𝑓 (𝑥𝑗) ) × (1 − 𝐹 (𝑥𝑛))𝛾𝑛−1𝑓 (𝑥𝑛) (1.1.1) on the cone 𝐹−1(0+) < 𝑥1 ≤ . . . ≤ 𝑥 𝑛< 𝐹−1(1).
In this work, only such generalized order statistics are discussed. The convenient form of their joint densities allows for simple computation of several divergence measures. It is useful to note the following.
Remark 1.1.2. The joint density of the first 𝑟 ∈ {1, . . . , 𝑛} GOSs according to Defini-tion 1.1.1 is given by (cf. Kamps, 1995b, p. 62)
𝑓𝑋∗1,...,𝑋∗𝑟 𝜸 (𝑥1, . . . , 𝑥𝑟) = ( 𝑟 ∏ 𝑗=1 𝛾𝑗 ) (𝑟−1 ∏ 𝑗=1 (1− 𝐹 (𝑥𝑗))𝛾𝑗−𝛾𝑗+1−1𝑓 (𝑥𝑗) ) × (1 − 𝐹 (𝑥𝑟))𝛾𝑟−1𝑓 (𝑥𝑟) (1.1.2) on the cone 𝐹−1(0+) < 𝑥 1 ≤ . . . ≤ 𝑥𝑟< 𝐹−1(1).
The particular structure in equation (1.1.2) is the same as in equation (1.1.1). This observation allows for two possible points of view on occurring marginal densities. The marginal densities of the first 𝑟 of 𝑛 (𝑟 < 𝑛) ordered quantities in an included model
coincide with corresponding marginal densities of GOSs with appropriate model pa-rameters. An alternative approach is to model the first 𝑟 random variables as GOSs themselves with another parametrization (e.g., setting ˜𝑛 = 𝑟 in place of 𝑛 in (1.1.1)).
A variety of models of ordered random variables can be described with the presentation given in (1.1.1) using this concept of GOSs. The form of the densities will be used to determine exact expressions of the divergence measures considered in this work. For a more detailed discussion about underlying models of GOSs we refer to Kamps (1995b) and his references. Here, only some important examples of models covered by GOSs will be discussed briefly in the following (cf. Kamps (2006)).
In statistical modeling (ordinary) order statistics (OSs) play a prominent role. For given random variables 𝑋1, . . . , 𝑋𝑛 the corresponding quantities arranged in ascend-ing order 𝑋1:𝑛, . . . , 𝑋𝑛:𝑛 are called order statistics. Throughout this work, the original random variables, on which OSs are based, will be assumed to be independent and identically distributed (iid). Based on iid random variables 𝑋1, . . . , 𝑋𝑛, where 𝑋1 is dis-tributed according to an absolutely continuous distribution function 𝐹 , the joint density of the first 𝑟 OSs is obtained by setting 𝛾𝑗 = 𝑛− 𝑗 + 1, 𝑗 = 1, . . . , 𝑟, in equation (1.1.2). For an introduction into the topic of OSs we refer to David and Nagaraja (2003). An interesting application of OSs is obtained when modeling (𝑛− 𝑟 + 1)-out-of-𝑛 systems. Such a system consists of 𝑛 identical components, which start working at the same time. The system keeps on working as long as at least 𝑛−𝑟+1 components are running. Thus, if random variables 𝑋1, . . . , 𝑋𝑛 model the components’ failure times, then the 𝑟th OS 𝑋𝑟:𝑛 represents the life length of the system.
A more realistic model in many practical situations is given, if the failure of each of the components may influence the life length of the remaining components at work. By this more flexible modeling it can be taken into consideration if the failure of a component causes damage on the remaining ones or if after a failure the remaining components are supposed to bear an increased workload. Starting with iid random variables 𝑋1(1), . . . , 𝑋𝑛(1) distributed according to a distribution function 𝐹1 modeling the life lengths of the 𝑛 components of the system each, the first (ordinary) OS 𝑋1:𝑛(1) describes the first failure time. Given a corresponding realization 𝑥(1)1:𝑛, the next failure time is modeled as minimum of iid random variables 𝑋1(2), . . . , 𝑋
(2)
𝑛−1 distributed accord-ing to a possibly different distribution function 𝐹2 truncated on the left at 𝑥(1)1:𝑛, that is, 𝑋1(2) ∼ (𝐹2− 𝐹2(𝑥(1)1:𝑛))/(1− 𝐹2(𝑥
(1)
1:𝑛)). Proceeding in this way leads to the structure of sequential order statistics (SOSs), which allows for more flexible modeling as mentioned above. The model of SOSs can be viewed as extension of the model of OSs. In this work, we restrict ourselves to the particular choice of the distribution functions
𝐹𝑗 = 1− (1 − 𝐹 )𝛼𝑗, 𝑗 = 1, . . . , 𝑛, (1.1.3) with a distribution function 𝐹 and positive model parameters 𝛼1, . . . , 𝛼𝑛. Such SOSs are called sequential order statistics based on 𝐹 and 𝛼1, . . . , 𝛼𝑛 in the following. If 𝐹 is an absolutely continuous distribution function, the joint density of the first 𝑟 ∈ {1, . . . , 𝑛} SOSs is obtained by setting 𝛾𝑗 = 𝛼𝑗(𝑛− 𝑗 + 1), 𝑗 = 1, . . . , 𝑟, in equation (1.1.2). The particular choice of the baseline distributions 𝐹1, . . . , 𝐹𝑛 given in (1.1.3) leads to the hazard function 𝛼𝑗+1𝑓 /(1− 𝐹 ) of each component at work after the 𝑗th failure. This provides a simple interpretation of the parameters, since they establish a factor for the
respective failure rates. In most practical applications, especially when dealing with technical systems, it is plausible to have non-decreasing failure rates from step to step, i.e. 𝛼1 ≤ . . . ≤ 𝛼𝑛.
Another well-known and widely used model concerning ordered quantities besides the one of order statistics is the model of record values (RVs) (cf., e.g., Arnold et al., 1998). They describe successive largest values in a sequence of iid random variables (𝑋𝑗)𝑗∈ℕ, where each random variable is distributed according to a continuous distribu-tion funcdistribu-tion 𝐹 . With record times 𝐿(1) = 1 and 𝐿(𝑖 + 1) = min{𝑗 > 𝐿(𝑖) : 𝑋𝑗 > 𝑋𝐿(𝑖)}, 𝑖 ∈ ℕ, the RVs are given by 𝑋𝐿(𝑖), 𝑖∈ ℕ. The joint density of the first 𝑟 record values is obtained by setting 𝛾𝑗 = 1, 𝑗 = 1, . . . , 𝑟, in equation (1.1.2).
Just as SOSs are an extension of (ordinary) OSs, Pfeifer record values (PRVs) ex-tend the model of RVs in a similar way. In the model of PRVs the respective distributions are allowed to change after each observed record. More precisely, PRVs are based on a double sequence of independent random variables (𝑋𝑗(𝑖))𝑖,𝑗∈ℕ with 𝑋𝑗(𝑖) ∼ 𝐹𝑖 for 𝑖, 𝑗 ∈ ℕ. With inter record times Δ1 = 1 and Δ(𝑖+1) = min{𝑗 ∈ ℕ : 𝑋𝑗(𝑖+1) > 𝑋
(𝑖)
Δ𝑖}, 𝑖 ∈ ℕ, Pfeifer record values are given by 𝑋Δ𝑖(𝑖), 𝑖∈ ℕ. In this work, we restrict ourselves to the particular choice of the distribution functions (cf. (1.1.3))
𝐹𝑗 = 1− (1 − 𝐹 )𝛽𝑗, 𝑗 = 1, . . . , 𝑟,
with an absolutely continuous distribution function 𝐹 and positive parameters 𝛽𝑗, 𝑗 = 1 . . . , 𝑟. The joint density of the first 𝑟 PRVs is obtained by setting 𝛾𝑗 = 𝛽𝑗, 𝑗 = 1, . . . , 𝑟, in equation (1.1.2).
Progressively Type-II censored order statistics (PC OSs) are another kind of or-dered quantities. They originate from a quite different motivation and are also included in GOSs. Starting a life test with 𝑁 units only 𝑛 < 𝑁 failure times are designated for observation. Embedded in ordinary OSs such a scenario would be achieved if the experiment is stopped after the failure of the 𝑛th out of 𝑁 possible ones. In this case 𝑋1:𝑁, . . . , 𝑋𝑛:𝑁 would be the ordered quantities of interest, whereas the life times of censored items are only known to be larger than 𝑋𝑛:𝑁 or its realization. The described procedure is called Type-II censoring on the right. The number of failures is pre-fixed and the duration of the experiment is random (in contrast to Type-I censoring). Pro-gressive Type-II censoring is an extension of this kind of censoring in the sense that not all censored units are necessarily removed after the 𝑛th failure, but a pre-fixed number 0 ≤ 𝑅𝑗 ≤ 𝑁 − 𝑛 of items still at work can be removed at random after the 𝑗th failure for each 𝑗 = 1, . . . , 𝑛. The tuple
𝑹= (𝑅1, . . . , 𝑅𝑛)
is called censoring scheme. Obviously, 𝑛 is the number of observed failure times and ∑𝑛
𝑖=1𝑅𝑖 is the total number of removed objects. This yields 𝑁 = 𝑛 + ∑𝑛
𝑖=1𝑅𝑖. For insights about the topic of progressive censoring we refer to the overview article Bal-akrishnan (2007) and the monograph BalBal-akrishnan and Aggarwala (2000). The joint density of the first 𝑟 progressively Type-II censored OSs is obtained by setting 𝛾𝑗 = 𝑛− 𝑗 + 1 + ∑𝑛𝑖=𝑗𝑅𝑖, 𝑗 = 1, . . . , 𝑟, in equation (1.1.2). Note that the first 𝑟 PC OSs based on a censoring scheme (𝑅1, . . . , 𝑅𝑛) form a progressively Type-II censored sample of size 𝑟 from 𝑁 units with censoring scheme (𝑅1, . . . , 𝑅𝑟−1, 𝑁− 𝑟 −∑𝑟−1𝑗=1𝑅𝑗) (see, e.g., Balakrishnan and Aggarwala, 2000, Thm. 2.3, p. 12).
Remark 1.1.3. It can be seen from the joint density given in (1.1.1) that GOSs based on a distribution function 𝐹 and a parameter vector 𝑐⋅ 𝜸 ∈ ℝ𝑛
can also be interpreted as GOSs based on 1− (1 − 𝐹 )𝑐 and 𝜸 ∈ ℝ𝑛
+. Analogously, SOSs based on constant parameters 𝛼1 = . . . = 𝛼𝑛 may be viewed as OSs based on a different distribution function, namely 1− (1 − 𝐹 )𝛼1. Similarly, PRVs with constant parameters 𝛽1 = . . . = 𝛽𝑛 can be interpreted as RVs based on 1− (1 − 𝐹 )𝛽1. Clearly, 𝛼1 = 1 and 𝛽1 = 1 yield ordinary OSs and RVs based on 𝐹 , respectively.
Table 1.1 provides an overview for different choices of 𝜸∈ ℝ𝑛
+ and the corresponding models.
Model 𝛾𝑗 > 0 (1≤ 𝑗 ≤ 𝑛) Abbreviation
order statistics based on ˜
𝐹 = 1− (1 − 𝐹 )𝛼0 (𝑛− 𝑗 + 1)𝛼0
𝑂𝑆𝛼0
(𝑂𝑆 = 𝑂𝑆1) sequential order statistics with
𝐹𝑗 = 1− (1 − 𝐹 )𝛼𝑗, 1≤ 𝑗 ≤ 𝑛
(𝑛− 𝑗 + 1)𝛼𝑗 𝑆𝑂𝑆𝜶
record values based on ˜
𝐹 = 1− (1 − 𝐹 )𝛽0 𝛽0
𝑅𝑉𝛽0
(𝑅𝑉 = 𝑅𝑉1) Pfeifer record values with
𝐹𝑗 = 1− (1 − 𝐹 )𝛽𝑗, 1≤ 𝑗 ≤ 𝑛
𝛽𝑗 𝑃 𝑅𝑉𝜷
progressive Type-II censored order
statistics 𝑛− 𝑗 + 1 +
∑𝑛
𝑖=𝑗𝑅𝑖 𝑃 𝐶𝑹
Table 1.1.: Models of ordered random variables included in the model of generalized order statistics by appropriate choice of 𝛾1, . . . , 𝛾𝑛(𝛼0, 𝛽0 ∈ ℝ+, 𝜶 = (𝛼1, . . . , 𝛼𝑛)′∈ ℝ𝑛+, 𝜷= (𝛽1, . . . , 𝛽𝑛)′ ∈ ℝ𝑛+, 𝑹 = (𝑅1, . . . , 𝑅𝑛)′ ∈ ℕ𝑛0)
1.2. Several Divergence Measures
Divergence measures are coefficients to quantify the dissimilarity of two probability dis-tributions. Divergence measure coefficients are 0 if and only if the distributions are the same. The higher the coefficient the ”further away from each other” the probability dis-tributions are, and the smaller the coefficient the ”closer to each other” the disdis-tributions are. Although the measures do not necessarily satisfy all metric properties, this descrip-tion conveys the idea that the divergence measures are interpreted as distances between probability distributions. In fact, most of the divergence measures considered in this work fail to satisfy the triangle inequality. Some measures are not even symmetric, but they partly satisfy the requirements of the measures in the following definition.
Definition 1.2.1
Let 𝔓 be a set. A function 𝐷 : 𝔓× 𝔓 → ℝ is called a distance (or distance measure) on 𝔓 if for all 𝑃, 𝑄∈ 𝔓, there holds:
(ii) 𝐷(𝑃, 𝑄) = 𝐷(𝑄, 𝑃 ) (symmetry). (iii) 𝐷(𝑃, 𝑃 ) = 0 (reflexivity).
If 𝐷 does have the first and the third property but not the second, it is called quasi-distance (cf., e.g., Deza and Deza, 2009, pp. 3/4) or divergence instead.
The measures considered in this work do additionally fulfill 𝐷(𝑃, 𝑄) = 0 ⇒ 𝑃 = 𝑄. Hence, the main difference between a metric and the considered distance (i.e., symmetric divergence) measures is the validity of the triangle equality
𝐷(𝑃1, 𝑃2)≤ 𝐷(𝑃1, 𝑄) + 𝐷(𝑄, 𝑃2), 𝑃1, 𝑃2, 𝑄∈ 𝔓.
Although the origins of the considered measures are not the same, they are all denoted as divergence measures in this work. The usage of the unified term ”divergence measure” is adapted from Pardo (2006), who gives a systematic overview of divergence measures and their use in statistical inference. A wide class of divergence measures is given by the concept of Φ-divergences introduced by Csisz´ar (1963) and Ali and Silvey (1966). For our purposes it is not useful to consider such a wide class, since the coefficients cannot be given in a closed form for GOSs due to the flexibility of the function Φ. The same applies to the even more general class of divergences given by the (ℎ, Φ)-divergences (Men´endez et al., 1995). In the following let (𝔛, 𝔅) be a measurable space and 𝔓 ={𝑃𝜗∣𝜗 ∈ Θ} a family of equivalent probability measures on (𝔛, 𝔅) with Θ∕= ∅. Further, let 𝑃𝜗, 𝜗∈ Θ, be absolutely continuous with respect to a 𝜎-finite measure 𝜇 on (𝔛, 𝔅) with 𝜇-densities of 𝑃𝜗, 𝜗∈ Θ,
𝑓𝜗(𝑥) = 𝑑𝑃𝜗
𝑑𝜇 (𝑥), 𝑥∈ 𝔛.
An important and well-known divergence measure introduced by Kullback and Leibler (1951) as mean information for discrimination between two distributions (or hypotheses) is the (directed) Kullback-Leibler divergence
𝐷𝐾𝐿(𝑓𝜗1, 𝑓𝜗2) = ∫ 𝔛 𝑓𝜗1(𝑥) ln 𝑓𝜗1(𝑥) 𝑓𝜗2(𝑥)𝑑𝜇(𝑥) = 𝐸𝜗1 [ ln𝑓𝜗1(𝑋) 𝑓𝜗2(𝑋) ] . (1.2.1)
𝐷𝐾𝐿is a function with domain{𝑓𝜗∣𝜗 ∈ Θ}×{𝑓𝜗∣𝜗 ∈ Θ} and so are the other divergences in this section. Since we assume that the 𝜇-densities 𝑓𝜗 may be identified with their cor-responding parameter 𝜗 and vice versa, throughout this work it is written 𝐷𝐾𝐿(𝜗1, 𝜗2) instead of 𝐷𝐾𝐿(𝑓𝜗1, 𝑓𝜗2) whenever it is convenient. Strictly speaking, considering the divergence measures as functions with domain {𝑃𝜗∣𝜗 ∈ Θ} × {𝑃𝜗∣𝜗 ∈ Θ} would be the most accurate point of view.
Kullback and Leibler (1951) and Kullback (1959) introduced and studied 𝐷𝐾𝐿 as a mea-sure of information for general probability meamea-sures. Some other meamea-sures originated from information theory too, but as mentioned earlier in this section, all the coefficients will be just considered as divergences where larger values indicate larger discrepancies between probability distributions.
A closely related divergence measure was introduced by Jeffreys (1946, 1948). It can be seen as a symmetric version of the Kullback-Leibler divergence, namely
To emphasize the symmetry property with respect to its arguments, we refer to Jeffreys J-divergence as J-distance in the following. Note that 𝐷𝐾𝐿 is not symmetric with respect to its arguments.
R´enyi (1961) introduced generalized probability distributions and a system of postulates for an information measure. In the discrete case he achieved R´enyi’s information of order 𝛼 for 𝛼∈ ℝ+∖{1}, of which a general analogue was extended by Liese and Vajda (1987) with a factor 1/𝛼 to 𝐷𝑅,𝛼(𝑓𝜗1, 𝑓𝜗2) = 1 𝛼(𝛼− 1)ln ∫ 𝔛 𝑓𝜗1(𝑥)𝛼𝑓𝜗2(𝑥)1−𝛼𝑑𝜇(𝑥). (1.2.3) In this work, 𝐷𝑅,𝛼 from equation (1.2.3) will be referred to as R´enyi divergence (of order 𝛼). The additional factor 1/𝛼 yields the symmetry
𝐷𝑅,𝛼(𝑓𝜗1, 𝑓𝜗2) = 𝐷𝑅,1−𝛼(𝑓𝜗2, 𝑓𝜗1) .
Furthermore, there is a relationship to the Kullback-Leibler divergence given by the equations (see Liese and Vajda, 1987, pp. 35 ff)
lim
𝛼→1𝐷𝑅,𝛼(𝑓𝜗1, 𝑓𝜗2) = 𝐷𝐾𝐿(𝑓𝜗1, 𝑓𝜗2) and lim
𝛼→0𝐷𝑅,𝛼(𝑓𝜗1, 𝑓𝜗2) = 𝐷𝐾𝐿(𝑓𝜗2, 𝑓𝜗1),
(1.2.4) where the first equation is also valid for the original R´enyi divergence and the second is not if the factor 1/𝛼 is omitted. Another related measure traces back to Bhattacharyya (1943), who extended his divergence between two multinomial populations to continuous distributions as 𝐷𝐵(𝑓𝜗1, 𝑓𝜗2) = − ln ∫ 𝔛 √ 𝑓𝜗1(𝑥)𝑓𝜗2(𝑥)𝑑𝜇(𝑥) = 1 4𝐷𝑅,12(𝑓𝜗1, 𝑓𝜗2). (1.2.5) With respect to the two concerned distributions the Bhattacharyya distance 𝐷𝐵 is symmetric and 𝛼 = 1/2 is the only special case for which 𝐷𝑅,𝛼 is symmetric. The next measure is also related to the Bhattacharyya distance and further it is the only measure considered in this work that is a metric. To emphasize this specialty among the other divergence measures, we denote it as the Hellinger distance or Hellinger metric. For (𝑚 = 2) it is given by 𝐷𝐻,2(𝑓𝜗1, 𝑓𝜗2) = (∫ 𝔛∣ √ 𝑓𝜗1(𝑥)−√𝑓𝜗2(𝑥)∣2𝑑𝜇(𝑥) )1 2 (1.2.6) = (2− 2 exp(−𝐷𝐵(𝑓𝜗1, 𝑓𝜗2))) 1 2
It is dedicated to Hellinger (1909), since it can be defined in terms of the Hellinger integral. Matusita studied properties of the Hellinger distance in several works, for example Matusita (1964) and references given there, with a main focus on statistical decisions. As a generalization for 𝑚≥ 1 we also denote
𝐷𝐻,𝑚(𝑓𝜗1, 𝑓𝜗2) = (∫ 𝔛 ∣𝑓𝜗1(𝑥) 1 𝑚 − 𝑓 𝜗2(𝑥) 1 𝑚∣𝑚𝑑𝜇(𝑥) )1 𝑚 (1.2.7) as Hellinger distance. Nevertheless, when speaking of Hellinger distance, the measure 𝐷𝐻,2 will be meant in this work if nothing else is explicitly mentioned. The expression
(𝐷𝐻,𝑚(𝑓𝜗1, 𝑓𝜗2))𝑚 coincides with a second invariant introduced by Jeffreys (1946, 1948) simultaneously with the J-distance given in (1.2.2). It is worth mentioning that the Hellinger distance is bounded (by √𝑚
2) in contrast to the other divergences.
𝐷𝐾𝐿 and 𝐷𝐽 are included in the class of Φ-divergences; 𝐷𝐵 and 𝐷𝑅,𝛼 are not, but they belong to the class of (ℎ, Φ)-divergences (see, e.g., Pardo, 2006, pp. 6/8).
All considered measures in this work are divergences. The measures of Bhattacharyya, Jeffreys, and R´enyi of order 1/2 are distances, and the Hellinger distance is even a metric.
1.3. Outline
In Chapter 2, the divergences from Section 1.2 are determined for models of GOSs from Section 1.1. It is a natural idea to learn more about the structure possessed by the model of GOSs by comparing different included models using these divergences as coefficients of discrepancies.
In order to obtain the explicit forms of the different divergence measures between GOSs based on the same underlying distribution function 𝐹 , exponential families are defined in Section 2.1, and explicit forms of the divergence measures for the latter are derived. The explicit forms of divergences for GOSs can be determined by exploiting the expo-nential family structure of GOSs.
In Section 2.2, the first results concerning the formulas of the divergences are stated, and the different models identified with their model parameters as representations are considered and illustrated as points in the Euclidean Space ℝ𝑟. At this, closest models and spheres with respect to different divergences are discussed.
Chapter 3 is concerned with applications (to SOSs) based on the considered divergence measures.
In Section 3.1 some general properties of maximum likelihood estimators are noted, and Section 3.2 takes up the results of the previous Chapter 2 directly. Multivariate con-fidence regions for the model parameters are considered. They are given implicitly by inequalities concerning divergence measures. A comparison of these confidence sets and rectangular confidence regions is given for simulated data sets.
Section 3.3 deals with the results of Men´endez et al. (1997). They considered a diver-gence measure for 𝑡 > 2 populations in case of populations belonging to an exponential family, and derived its asymptotic distribution in order to construct statistical tests for homogeneity within the 𝑡 populations. By presenting their results a few mistakes are corrected, and a shorter notation for some equations is shown. Moreover, the results are applied to the case of sequential (𝑛− 𝑟 + 1)-out-of-𝑛 systems finding that, for the considered example, the derived asymptotic test leads to type I error rates which dis-tinctly exceed the nominal significance levels for reasonable sample sizes. That is, the asymptotic test is not applicable in many practical situations, since a small type I error rate is not assured.
In Section 3.4, a new general estimation approach using pre-information about the mag-nitude of the parameters to be estimated is introduced in the framework of exponential families first. This somehow heuristical approach using divergence measures is designed
for small sample sizes, for which it is known that the maximum likelihood estimator per-forms poorly. Further, sequential (𝑛−𝑟+1)-out-of-𝑛 systems are considered as example. The simulation study results indicate high potential especially for multiparameter cases. Finally, in Chapter 4, the contents of the work are discussed in conclusion.
2. Framework
In the first section of this chapter, exponential families are introduced along with useful properties. For such families, exact forms of some divergence measures can be found in the literature. We refer to these results in order to obtain explicit expressions of the divergences for generalized order statistics. This is possible, since generalized order statistics form an exponential family in model parameters (see Bedbur et al., 2012). In Section 2.2, the explicit formulas are studied.
2.1. Exponential Families
Exponential families are parametric families of distributions which are characterized by the specific form of the corresponding density functions. The distributions of many important classes form exponential families, for example, normal distributions, gamma distributions, and beta distributions. Due to nice properties of exponential families it is often useful to view questions involving particular classes of distributions from the perspective of exponential families (cf, e.g., Brown, 1986). This is the case for the purpose of deriving explicit forms of the divergences considered in this work. Therefore, exponential families are introduced in the following. Regarding the respective notations in this work concerning exponential families we follow the dissertation of Bedbur (2011) closely.
Definition 2.1.1 (Exponential Family)
Let Θ ∕= ∅ be a set of parameters and 𝔓 = {𝑃𝜗 ∣ 𝜗 ∈ Θ} be a family of distributions on a measurable space (𝔛, 𝔅). If there exist an integer 𝑘 ∈ ℕ, a 𝜎-finite measure 𝜇 on (𝔛, 𝔅) dominating 𝔓 and functions
𝐶 : Θ→ ℝ 𝜻 = (𝜁1, . . . , 𝜁𝑘)′ : Θ→ ℝ𝑘
𝑻 = (𝑇1, . . . , 𝑇𝑘)′ : (𝔛, 𝔅)→ (ℝ𝑘, 𝔹𝑘)
ℎ : (𝔛, 𝔅)→ (ℝ≥0, 𝔹∩ ℝ≥0) such that the 𝜇-densities of 𝑃𝜗, 𝜗∈ Θ, are given by
𝑓𝜗(𝑥) = 𝑑𝑃𝜗 𝑑𝜇 = 𝐶(𝜗) exp ( 𝑘 ∑ 𝑗=1 𝜁𝑗(𝜗)𝑇𝑗(𝑥) ) ℎ(𝑥), 𝑥∈ 𝔛, (2.1.1)
𝐶(𝜗) is a normalizing constant, which is sometimes useful to be written in the argu-ment of the exponential function. Therefore, we additionally define the mapping
𝜅 : Θ→ ℝ
𝜗7→ − ln(𝐶(𝜗)) (2.1.2)
This yields another representation of equation (2.1.1) 𝑓𝜗(𝑥) = 𝑑𝑃𝜗 𝑑𝜇 = exp ( 𝑘 ∑ 𝑗=1 𝜁𝑗(𝜗)𝑇𝑗(𝑥)− 𝜅(𝜗) ) ℎ(𝑥), 𝑥∈ 𝔛,
A more convenient form, which will be of our main interest, is given in dependence on natural parameters.
Definition 2.1.2 (Natural parameter space; natural extension) Let 𝔓 be an exponential family according to Def. 2.1.1. Then
Θ∗ :={𝜻 = (𝜁1, . . . , 𝜁𝑘)′ ∈ ℝ𝑘∣0 < ∫ 𝔛 exp { 𝑘 ∑ 𝑗=1 𝜁𝑗𝑇𝑗(𝑥) } ℎ(𝑥)𝑑𝜇(𝑥) <∞} is called natural parameter space of 𝔓. 𝔓∗ ={𝑃∗
𝜻∣𝜻 ∈ Θ∗} with 𝐶∗(𝜻) = (∫ 𝔛 exp { 𝑘 ∑ 𝑗=1 𝜁𝑗𝑇𝑗(𝑥) } ℎ(𝑥)𝑑𝜇(𝑥) )−1 , 𝑓𝜻∗(𝑥) = 𝐶∗(𝜻) exp { 𝑘 ∑ 𝑗=1 𝜁𝑗𝑇𝑗(𝑥) } ℎ(𝑥), 𝑥∈ 𝔛, and 𝑃𝜻∗ = 𝑓𝜻∗𝜇,
is the natural extension of 𝔓.
Throughout this work let the general assumption be true that the considered expo-nential families have open natural parameter spaces. The value 𝐶∗(𝜻) is a normalizing constant. As stated in (2.1.2), we additionally define a transformed mapping
𝜅∗ : Θ∗ → ℝ
𝜻 7→ − ln(𝐶∗(𝜻)). (2.1.3)
The mappings 𝜅 and 𝜅∗ are closely related. In fact, it is 𝜅∗(𝜻(𝜗)) = 𝜅(𝜗), 𝜗 ∈ Θ. In the following subsection, both notations are distinguished strictly, in order to clarify the difference between both cases for computation of the divergences. Afterwards, for reasons of simplicity, only one notation (without ”∗”) will be used if the meaning is clear in context (natural parameter space, or not). In case of GOSs, the natural parameter representation comes out directly.
An exponential family is called strictly 𝑘-parametrical if the integer 𝑘 is minimal for such a representation of the densities, in the sense that there is no such representation with a smaller number of statistics. It is useful to have a characterization of this property. The following theorem (see, e.g., Witting, 1985, Thm. 1.153, p. 145) contains one using affine independence.
Definition 2.1.3 (Affine independence) Let 𝑘 ∈ ℕ.
(i) Let 𝜁1, . . . , 𝜁𝑘 be real valued functions with domain Θ ∕= ∅. 𝜁1, . . . , 𝜁𝑘 are called affinely independent if for 𝑎0, 𝑎1, . . . , 𝑎𝑘, 𝑏∈ ℝ it holds:
𝑘 ∑
𝑗=1
𝑎𝑗𝜁𝑗(𝜗) = 𝑏 ∀ 𝜗 ∈ Θ ⇒ 0 = 𝑎1 = . . . = 𝑎𝑘= 𝑏.
(ii) Let (𝔛, 𝔅) be a measurable space and 𝑇1, . . . , 𝑇𝑘 be real valued 𝔅− 𝔹−measurable functions on (𝔛, 𝔅). 𝑇1, . . . , 𝑇𝑘 are called 𝑃 -affinely independent for a probability measure 𝑃 on (𝔛, 𝔅) if for 𝑎1, . . . , 𝑎𝑘, 𝑏∈ ℝ it holds:
𝑘 ∑
𝑗=1
𝑎𝑗𝑇𝑗 = 𝑏 𝑃 − 𝑎.𝑠. ⇒ 0 = 𝑎1 = . . . = 𝑎𝑘 = 𝑏.
For a set 𝔓 of probability measures, 𝑇1, . . . , 𝑇𝑘 are called 𝔓-affinely independent, if 𝑇1, . . . , 𝑇𝑘 are affinely independent for every 𝑃 ∈ 𝔓.
Theorem 2.1.4
Let 𝔓 be an exponential family in 𝜁1, . . . , 𝜁𝑘and 𝑇1, . . . , 𝑇𝑘according to Def. 2.1.1. Then, it is
(i) 𝔓 is strictly 𝑘-parametrical if and only if the 𝜇-densities have a representation according to (2.1.1) with affinely independent functions 𝜁1, . . . , 𝜁𝑘 and 𝔓-affinely independent statistics 𝑇1, . . . , 𝑇𝑘.
(ii) 𝑇1, . . . , 𝑇𝑘 are 𝔓-affinely independent statistics if and only if there exists 𝜗 ∈ Θ such that 𝐶𝑜𝑣𝜗(𝑻 ) > 0 (i.e., 𝐶𝑜𝑣𝜗(𝑻 ) is positive definite).
The following theorem demonstrates some useful properties of exponential families (see, e.g., Witting, 1985, Thm. 1.164, pp. 152/153 and Thm. 1.170, p. 157).
Theorem 2.1.5
Let 𝔓 be a 𝑘-parametrical exponential family in 𝜻 and 𝑇 with natural parameter space Θ∗ and 𝜇-densities of the form
𝑓𝜻∗(𝑥) = 𝑑𝑃 ∗ 𝜻 𝑑𝜇 = 𝐶 ∗(𝜻) exp ( 𝑘 ∑ 𝑗=1 𝜁𝑗𝑇𝑗(𝑥) ) ℎ(𝑥) = exp ( 𝑘 ∑ 𝑗=1 𝜁𝑗𝑇𝑗(𝑥)− 𝜅∗(𝜻) ) ℎ(𝑥), 𝑥∈ 𝔛. (2.1.4)
(i) The statistic 𝑻 = (𝑇1, . . . , 𝑇𝑘)′ has finite moments of any order. The functions 𝜻 7→ 𝐸𝜻𝑇1𝑙1. . . 𝑇𝑘𝑙𝑘, 𝜅∗ and 𝐶∗ are arbitrarily often differentiable in 𝜻, and it is
𝐸𝜁𝑻 =∇𝜅∗(𝜻) =−∇ ln 𝐶∗(𝜻), (2.1.5) 𝐶𝑜𝑣𝜁𝑻 = 𝐻𝜅∗(𝜻) =−𝐻ln 𝐶∗(𝜻), 𝐸𝜁𝑇1𝑙1. . . 𝑇𝑘𝑙𝑘 = 𝐶∗(𝜁)∇𝑙11 . . .∇𝑙𝑘𝑘 ∫ 𝔛 exp ( 𝑘 ∑ 𝑗=1 𝜁𝑗𝑇𝑗(𝑥) ) ℎ(𝑥)𝑑𝜇(𝑥), where ∇𝜅∗ = ( ∂ ∂𝜁1𝜅∗, . . . , ∂ ∂𝜁𝑘𝜅∗)′ and 𝐻𝜅∗ = ( ∂2
∂𝜁𝑖∂𝜁𝑗𝜅∗)1≤𝑖,𝑗≤𝑘 denote the gradient
and the Hessian of 𝜅∗, respectively.
(ii) If 𝔓 is strictly 𝑘-parametrical, 𝐶𝑜𝑣𝜻𝑻 is positive definite.
(iii) The logarithmic derivative of the likelihood and the Fisher information matrix are given by
𝑼𝜻(𝑥) :=∇𝜻(ln 𝑓𝜻)(𝑥) = 𝑻 (𝑥)− 𝐸𝜻𝑻, I𝑓(𝜻) :=𝐸𝜻(𝑼𝜻(𝑋)𝑼𝜻(𝑋)′) = 𝐶𝑜𝑣𝜻(𝑻 ).
In particular, the single entries of the Fisher information matrix can be expressed in terms of partial derivatives of 𝜅∗.
Corollary 2.1.6
Given the situation of Theorem 2.1.5 for the (𝑖, 𝑗)-element of I𝑓(𝜻), 1 ≤ 𝑖, 𝑗 ≤ 𝑘, we have
I𝑓(𝜻)𝑖𝑗 =
∂2𝜅∗(𝜻)
∂𝜻𝑖∂𝜻𝑗 . (2.1.6)
In case of a natural parameter space Θ∗, we further define the mapping 𝜋 : Θ∗ → ℝ+
𝜻 7→ 𝐸𝜻(𝑻 ) (2.1.5)
= ∇𝜅∗(𝜻) (2.1.7)
for shorter notations in the following.
2.1.1. Explicit Forms of Divergences
Upon introducing the basic notations for exponential families, we derive explicit forms of divergences in the following.
We begin with the J-distance. Jeffreys (1946, 1948) obtained exact forms of his two invariants for particular distributions, whereas Huzurbazar (1955) found that for
distributions admitting sufficient statistics the exact forms of (𝐷𝐻,𝑚)𝑚(𝑚 even) and 𝐷𝐽 are explicit functions of the parameters of the distributions. Huzurbazar considered the most general form of distributions admitting sufficient statistics as given by Koopman (1936), who was one of the creators of the concept of exponential families. In fact, the probability densities examined by Huzurbazar are of the form (2.1.1) for one variate, but he also described that for multivariate distributions this variate just has to be replaced by a set of variates. In the following, we will reproduce the calculations of Huzurbazar using the notations in this work and general 𝜇-densities. Therefore, let
𝑓𝜗(𝑥) = 𝑑𝑃𝜗 𝑑𝜇 = exp ( 𝑘 ∑ 𝑗=1 𝜁𝑗(𝜗)𝑇𝑗(𝑥)− 𝜅(𝜗) ) ℎ(𝑥), 𝑥∈ 𝔛, and 𝑓𝜗˜(𝑥) = 𝑑𝑃𝜗˜ 𝑑𝜇 = exp ( 𝑘 ∑ 𝑗=1 𝜁𝑗(˜𝜗)𝑇𝑗(𝑥)− 𝜅(˜𝜗) ) ℎ(𝑥), 𝑥∈ 𝔛,
be density functions belonging to probability distributions 𝑃𝜗, 𝑃𝜗˜ ∈ 𝔓, respectively, where 𝔓 is an exponential family according to Definition 2.1.1. We begin with the computation of 𝐷𝐽: 𝐷𝐽(𝜗, ˜𝜗) = 𝐷𝐾𝐿(𝜗, ˜𝜗) + 𝐷𝐾𝐿(˜𝜗, 𝜗) = ∫ 𝔛 (𝑓𝜗(𝑥)− 𝑓˜𝜗(𝑥))(ln(𝑓𝜗(𝑥))− ln(𝑓˜𝜗(𝑥)))𝑑𝜇(𝑥) = ∫ 𝔛 (𝑓𝜗(𝑥)− 𝑓˜𝜗(𝑥)) 𝑘 ∑ 𝑗=1 ( 𝜁𝑗(𝜗)𝑇𝑗(𝑥)− 𝜅(𝜗) − 𝜁𝑗(˜𝜗)𝑇𝑗(𝑥) + 𝜅(˜𝜗))𝑑𝜇(𝑥) = 𝑘 ∑ 𝑗=1 [𝜁𝑗(𝜗)− 𝜁𝑗(˜𝜗)]⋅ ∫ 𝔛 (𝑓𝜗(𝑥)− 𝑓˜𝜗(𝑥))𝑇𝑗(𝑥)𝑑𝜇(𝑥) = 𝑘 ∑ 𝑗=1 [𝜁𝑗(𝜗)− 𝜁𝑗(˜𝜗)]⋅ [𝐸𝜗(𝑇𝑗)− 𝐸˜𝜗(𝑇𝑗)] = (𝜻(𝜗)− 𝜻(˜𝜗))′(𝐸𝜗(𝑻 )− 𝐸˜𝜗(𝑻 )).
Huzurbazar (1955) explained how to express [𝐸𝜗(𝑇𝑗)− 𝐸˜𝜗(𝑇𝑗)] in terms of 𝜗 in general, but we are only interested in the case of a natural parameter space, where 𝐸𝜗(𝑻 ) is obtainable from (2.1.5).
As an intermediate result, we compute an explicit expression for
𝜆𝑝,𝑞(𝑓𝜗, 𝑓𝜗˜) = ∫
𝔛
𝑓𝜗(𝑥)𝑝𝑓𝜗˜(𝑥)
𝑞𝑑𝜇(𝑥).
Huzurbazar calculated this coefficient for 𝑝 = 𝑚𝑚−𝑗 and 𝑞 = 𝑚𝑗, 𝑗 = 1, . . . , 𝑚, but here any 𝑝, 𝑞 ∈ [0, 1] with 𝑝 + 𝑞 = 1 are allowed. Thus, 𝜆𝑝,𝑞 can be used for the exact form of R´enyi divergence 𝐷𝑅,𝑝 for any 𝑝∈ (0, 1). The steps of calculation are not affected by
this. We find 𝜆𝑝,𝑞(𝑓𝜗, 𝑓𝜗˜) = ∫ 𝔛 𝑓𝜗(𝑥)𝑝𝑓𝜗˜(𝑥) 𝑞𝑑𝜇(𝑥) = exp(−𝑝𝜅(𝜗) − 𝑞𝜅(˜𝜗)) ∫ 𝔛 exp ( 𝑘 ∑ 𝑗=1 (𝑝𝜁𝑗(𝜗) + 𝑞𝜁𝑗(˜𝜗))𝑇𝑗(𝑥) ) ℎ(𝑥)𝑑𝜇(𝑥) = ∫ 𝔛 exp ( 𝑘 ∑ 𝑗=1 ( (𝑝𝜁𝑗(𝜗) + 𝑞𝜁𝑗(˜𝜗))𝑇𝑗(𝑥))−(𝜅∗(𝑝𝜻(𝜗) + 𝑞𝜻(˜𝜗))) ) ℎ(𝑥)𝑑𝜇(𝑥) × exp(𝜅∗(𝑝𝜻(𝜗) + 𝑞𝜻(˜𝜗)))exp(−𝑝𝜅(𝜗) − 𝑞𝜅(˜𝜗)) (∗) = exp[(𝜅∗(𝑝𝜻(𝜗) + 𝑞𝜻(˜𝜗)))−(𝑝𝜅(𝜗) + 𝑞𝜅(˜𝜗))],
where in (∗) it used that 𝑝𝜻(𝜗) + 𝑞𝜻(˜𝜗) ∈ Θ∗, if 𝜻(𝜗), 𝜻(˜𝜗) ∈ Θ∗, because the natural parameter space Θ∗ is convex (see, e.g., Lehmann and Romano, 2005, La. 2.7.1, p. 48). Now, for 𝑚∈ 2ℕ, we can state
( 𝐷𝐻,𝑚(𝜗, ˜𝜗))𝑚 = ∫ 𝔛 ∣𝑓𝜗(𝑥)𝑚1 − 𝑓˜ 𝜗(𝑥) 1 𝑚∣𝑚𝑑𝜇(𝑥) = ∫ 𝔛 (𝑓𝜗(𝑥) 1 𝑚 − 𝑓˜ 𝜗(𝑥) 1 𝑚)𝑚𝑑𝜇(𝑥) (since 𝑚 is even) = ∫ 𝔛 𝑚 ∑ 𝑗=0 (−1)𝑗 ( 𝑚 𝑗 ) 𝑓𝜗(𝑥)𝑚𝑗𝑓 ˜ 𝜗(𝑥) 𝑚−𝑗 𝑚 𝑑𝜇(𝑥) = 𝑚 ∑ 𝑗=0 (−1)𝑗 ( 𝑚 𝑗 ) ∫ 𝔛 𝑓𝜗(𝑥)𝑚𝑗𝑓 ˜ 𝜗(𝑥) 𝑚−𝑗 𝑚 𝑑𝜇(𝑥) = 𝑚 ∑ 𝑗=0 (−1)𝑗 ( 𝑚 𝑗 ) 𝜆𝑗 𝑚, 𝑚−𝑗 𝑚 (𝑓𝜗, 𝑓𝜗˜),
and this yields formulas for the Hellinger distance for 𝑚 ∈ 2ℕ in case of the two compared distributions belonging to the same exponential family
𝐷𝐻,𝑚(𝜗, ˜𝜗) = ( 𝑚 ∑ 𝑗=0 (−1)𝑗 ( 𝑚 𝑗 ) 𝜆𝑗 𝑚, 𝑚−𝑗 𝑚 (𝑓𝜗, 𝑓𝜗˜) )1 𝑚 , and as important special case we find
𝐷𝐻,2(𝜗, ˜𝜗) = ( 2− 2𝜆1 2,12(𝑓𝜗, 𝑓𝜗˜) )1 2 = ( 2− 2 exp [( 𝜅∗(1 2(𝜻(𝜗) + 𝜻(˜𝜗))) ) − ( 1 2(𝜅(𝜗) + 𝜅(˜𝜗)) )])1 2 . (2.1.8)
Noticing the equality
𝐷𝑅,𝛼(𝜗, ˜𝜗) = 1 𝛼(𝛼− 1)ln ( 𝜆𝛼,1−𝛼(𝑓𝜗, 𝑓𝜗˜) ) ,
the R´enyi divergence for an exponential family is also obtainable directly from the results of Huzurbazar (1955). Applying the equation in (1.2.4) delivers the missing Kullback-Leibler divergence. The missing divergence can also be calculated directly through straightforward computation with the exponential family structure (see Kullback, 1959, Corollary 3.2, p. 45). The steps of calculation are:
𝐷𝐾𝐿(𝜗, ˜𝜗) = ∫ 𝔛 𝑓𝜗(𝑥) ln𝑓𝜗(𝑥) 𝑓𝜗˜(𝑥) 𝑑𝜇(𝑥) = ∫ 𝔛 𝑓𝜗(𝑥) ( 𝑘 ∑ 𝑗=1 (𝜁𝑗(𝜗)− 𝜁𝑗(˜𝜗))𝑇𝑗(𝑥)− (𝜅(𝜗) − 𝜅(˜𝜗)) ) 𝑑𝜇(𝑥) =−(𝜅(𝜗) − 𝜅(˜𝜗)) + ( 𝑘 ∑ 𝑗=1 (𝜁𝑗(𝜗)− 𝜁𝑗(˜𝜗)) ∫ 𝔛 𝑓𝜗(𝑥)𝑇𝑗(𝑥)𝑑𝜇(𝑥) ) = 𝜅(˜𝜗)− 𝜅(𝜗) + ( 𝑘 ∑ 𝑗=1 (𝜁𝑗(𝜗)− 𝜁𝑗(˜𝜗))𝐸𝜗(𝑇𝑗)) ) = 𝜅(˜𝜗)− 𝜅(𝜗) +((𝜻(𝜗)− 𝜻(˜𝜗))′𝐸𝜗(𝑻 )).
Clearly, 𝐷𝐾𝐿(𝜗, ˜𝜗)+𝐷𝐾𝐿(˜𝜗, 𝜗) = 𝐷𝐽(𝜗, ˜𝜗) provides a further way to obtain the J-distance. Summarizing the previous results, we obtain the following:
Lemma 2.1.7 (Divergences in an exponential family)
Let 𝑓𝜗, 𝑓𝜗˜ be 𝜇-densities from an exponential family according to Def. 2.1.1. Then fol-lowing formulas are true.
Kullback-Leibler divergence 𝐷𝐾𝐿(𝜗, ˜𝜗) = 𝜅(˜𝜗)− 𝜅(𝜗) + ( (𝜻(𝜗)− 𝜻(˜𝜗))′𝐸𝜗(𝑻 ) ) Jeffreys J-distance 𝐷𝐽(𝜗, ˜𝜗) = (𝜻(𝜗)− 𝜻(˜𝜗))′(𝐸𝜗(𝑻 )− 𝐸˜𝜗(𝑻 )) R´enyi divergence for 𝛼∈ (0, 1)
𝐷𝑅,𝛼(𝜗, ˜𝜗) = 1 𝛼(𝛼− 1) [( 𝜅∗(𝛼𝜻(𝜗) + (1− 𝛼)𝜻(˜𝜗)))−(𝛼𝜅(𝜗) + (1− 𝛼)𝜅(˜𝜗))] Bhattacharyya distance 𝐷𝐵(𝜗, ˜𝜗) =− [( 𝜅∗(1 2(𝜻(𝜗) + 𝜻(˜𝜗))) ) − ( 1 2(𝜅(𝜗) + 𝜅(˜𝜗)) )]
Hellinger distance for 𝑚∈ 2ℕ
𝐷𝐻,𝑚(𝜗, ˜𝜗) = ( 𝑚 ∑ 𝑗=0 (−1)𝑗 ( 𝑚 𝑗 ) 𝜆𝑗 𝑚, 𝑚−𝑗 𝑚 (𝑓𝜗, 𝑓𝜗˜) )1 𝑚
In particular: Hellinger distance for 𝑚 = 2 𝐷𝐻,2(𝜗, ˜𝜗) = ( 2− 2 exp [( 𝜅∗(1 2(𝜻(𝜗) + 𝜻(˜𝜗))) ) − ( 1 2(𝜅(𝜗) + 𝜅(˜𝜗)) )])1 2
In case of natural parameters, the expressions are simplified.
Corollary 2.1.8 (Divergences in an exponential family with natural parameter space)
Let 𝑓𝜻, 𝑓˜𝜻 be 𝜇-densities of the form (2.1.4) from an exponential family with natural parameter space according to Def. 2.1.1 and Def. 2.1.2. Then following formulas are valid (for 𝜋 see (2.1.7)).
Kullback-Leibler divergence
𝐷𝐾𝐿(𝜻, ˜𝜻) = 𝜅∗(˜𝜻)− 𝜅∗(𝜻) +((𝜻− ˜𝜻)′𝜋(𝜻)) Jeffreys J-distance
𝐷𝐽(𝜻, ˜𝜻) = (𝜻− ˜𝜻)′(𝜋(𝜻)− 𝜋(˜𝜻)) R´enyi divergence for 𝛼∈ (0, 1)
𝐷𝑅,𝛼(𝜻, ˜𝜻) = 1 𝛼(𝛼− 1) [ 𝜅∗(𝛼𝜻 + (1− 𝛼)˜𝜻) −(𝛼𝜅∗(𝜻) + (1− 𝛼)𝜅∗(˜𝜻))] Bhattacharyya distance 𝐷𝐵(𝜻, ˜𝜻) =− [ 𝜅∗(1 2(𝜻 + ˜𝜻))− ( 1 2(𝜅 ∗(𝜻) + 𝜅∗(˜𝜻)) )]
Hellinger distance for 𝑚∈ 2ℕ
𝐷𝐻,𝑚(𝜻, ˜𝜻) = ( 𝑚 ∑ 𝑗=0 (−1)𝑗 ( 𝑚 𝑗 ) 𝜆𝑗 𝑚, 𝑚−𝑗 𝑚 (𝑓𝜻, 𝑓˜𝜻) )1 𝑚
In particular: Hellinger distance for 𝑚 = 2 𝐷𝐻,2(𝜻, ˜𝜻) = ( 2− 2 exp [( 𝜅∗(1 2(𝜻 + ˜𝜻)) ) − ( 1 2(𝜅 ∗(𝜻) + 𝜅∗(˜𝜻)) )])1 2
Proof. The results are obtained immediately by substituting the respective parameters in Lemma 2.1.7: 𝜻(𝜗) = 𝜻, 𝜻(˜𝜗) = ˜𝜻, and 𝜅∗ instead of 𝜅, where for the equations of Kullback-Leibler divergence and J-distance (2.1.7) is used.
Liese and Vajda (1987) stated these results for R´enyi divergence and Kullback-Leibler divergence. By Corollary 2.1.8, for natural parameters, all considered divergences in this work can be obtained directly by plugging in the mappings 𝜅∗ and 𝜋, where 𝜋 is obtained immediately from the partial derivatives of 𝜅∗ (see (2.1.7)). Hence, the latter is the function of interest.
2.1.2. GOSs and Exponential Families in Model Parameters
In this subsection, the mappings 𝜅 and 𝜋 are derived for densities of generalized order statistics. Beginning with densities of the form given in (1.1.1) and setting 𝐹 (𝑥0) := 0 in equation (∗), we conclude 𝑓𝑋∗1,...,𝑋∗𝑛 𝜸 (𝑥1, . . . , 𝑥𝑛) = ( 𝑛 ∏ 𝑗=1 𝛾𝑗 ) (𝑛−1 ∏ 𝑗=1 (1− 𝐹 (𝑥𝑗))𝛾𝑗−𝛾𝑗+1−1𝑓 (𝑥𝑗) ) × (1 − 𝐹 (𝑥𝑛))𝛾𝑛−1𝑓 (𝑥𝑛) = ( 𝑛 ∏ 𝑗=1 𝛾𝑗 ) ( 𝑛 ∏ 𝑗=1 (1− 𝐹 (𝑥𝑗))𝛾𝑗 ) ( 𝑛 ∏ 𝑗=2 (1− 𝐹 (𝑥𝑗−1))−𝛾𝑗 ) × ( 𝑛 ∏ 𝑗=1 (1− 𝐹 (𝑥𝑗))−1𝑓 (𝑥𝑗) ) = ( 𝑛 ∏ 𝑗=1 𝛾𝑗 ) exp ( 𝑛 ∑ 𝑗=1 𝛾𝑗ln (1− 𝐹 (𝑥𝑗))− 𝑛 ∑ 𝑗=2 𝛾𝑗ln (1− 𝐹 (𝑥𝑗−1)) ) × ( 𝑛 ∏ 𝑗=1 (1− 𝐹 (𝑥𝑗))−1𝑓 (𝑥𝑗) ) (∗) = ( 𝑛 ∏ 𝑗=1 𝛾𝑗 ) | {z } 𝐶(𝜸) exp ( 𝑛 ∑ 𝑗=1 𝛾𝑗ln( 1 − 𝐹(𝑥𝑗) 1− 𝐹 (𝑥𝑗−1) ) | {z } 𝑇𝑗(𝒙) ) ( 𝑛 ∏ 𝑗=1 𝑓 (𝑥𝑗) 1− 𝐹 (𝑥𝑗) ) | {z } ℎ(𝒙) = exp ( 𝑛 ∑ 𝑗=1 𝛾𝑗ln( 1 − 𝐹(𝑥𝑗) 1− 𝐹 (𝑥𝑗−1) ) − 𝑛 ∑ 𝑗=1 − ln 𝛾𝑗 | {z } 𝜅(𝜸) ) ( 𝑛 ∏ 𝑗=1 𝑓 (𝑥𝑗) 1− 𝐹 (𝑥𝑗) )
on the cone 𝐹−1(0+) < 𝑥1 ≤ . . . ≤ 𝑥𝑛< 𝐹−1(1). The exponential family structure with natural parameter 𝜸 = (𝛾1, . . . , 𝛾𝑛)′ is obvious. As previously mentioned, the natural parameter space notation of ”∗” will not be used in the following.
The first 𝑟 ∈ {1, . . . , 𝑛} (for fixed 𝑟) GOSs are considered as model for 𝑟 ordered random quantities. By this we have a natural method for quantifying distances between different models that are included in GOSs, due to the two points of view described in Remark 1.1.2, although, in reality, marginal distributions are compared. For example, if different distributions of random variables describing the failure times of a 4-out-of-5 system are considered, it is reasonable to consider only the first two SOSs. Accordingly, the divergence numbers between two marginal densities are interpreted as divergence
numbers for the respective models. Let 𝑿 = (𝑋∗1, . . . , 𝑋∗𝑟)′ denote the vector of the first 𝑟 GOSs and 𝔓𝑿 = {𝑃𝑿 𝜸 = 𝑓𝜸𝑿𝜆𝑟 ℝ𝑟 < : 𝜸 ∈ ℝ 𝑟
+} be the family of associated distributions, where ℝ𝑟
< = {𝑥 ∈ ℝ𝑟 : 𝐹−1(0) < 𝑥1 < . . . < 𝑥𝑟 < 𝐹−1(1)} denotes the cone of increasing real numbers and 𝜆𝑟
ℝ𝑟
< the 𝑟-dimensional Lebesgue measure on (ℝ𝑟, 𝔹𝑟) restricted to that cone. The 𝜆𝑟
ℝ𝑟 <-densities 𝑓 𝑿 𝜸 of the measures 𝑃𝜸𝑿, 𝜸 ∈ ℝ𝑟+, are given by 𝑓𝑿 𝜸 (𝒙)= exp (( 𝑟 ∑ 𝑗=1 𝛾𝑗𝑇𝑗(𝒙) ) − 𝜅(𝜸) ) ℎ(𝒙), 𝒙∈ ℝ𝑟<, 𝜆𝑟ℝ𝑟 <− 𝑎.𝑒., (2.1.9) with ℎ(𝒙) =(∏𝑟𝑗=1 1−𝐹 (𝑥𝑗)𝑓 (𝑥𝑗) ), 𝒙 = (𝑥1, . . . , 𝑥𝑟)′ ∈ ℝ𝑟 <, 𝑇1(𝒙) = ln (1− 𝐹 (𝑥1)) , 𝒙∈ ℝ𝑟 <, 𝑇𝑗(𝒙) = ln ( 1− 𝐹 (𝑥𝑗) 1− 𝐹 (𝑥𝑗−1) ) , 𝒙∈ ℝ𝑟 <, 𝑗 = 2, . . . , 𝑟, (2.1.10) and 𝜅(𝜸) =− 𝑟 ∑ 𝑗=1 ln 𝛾𝑗, 𝜸 = (𝛾1, . . . , 𝛾𝑟)′ ∈ ℝ𝑟+. (2.1.11) Then 𝔓𝑿 is an exponential family with natural parameter space ℝ𝑟
+. For 𝜸 = (𝛾1, . . . , 𝛾𝑟)′ ∈ ℝ𝑟 + Theorem 2.1.5 yields 𝐸𝜸𝑻 = 𝜋(𝜸) =∇𝜅(𝜸) = (− 1 𝛾1, . . . ,− 1 𝛾𝑟) ′, (2.1.12) I𝑓(𝜸) = 𝐶𝑜𝑣𝜸(𝑻 ) = 𝐻𝜅(𝜸) = diag( 1 𝛾2 1 , . . . , 1 𝛾2 𝑟 ) > 0, (2.1.13) where diag(𝑎1, . . . , 𝑎𝑟) denotes the diagonal matrix in ℝ𝑟×𝑟which has elements 𝑎1, . . . , 𝑎𝑟 on the main diagonal (i.e. entry (𝑖, 𝑖) is 𝑎𝑖, 𝑖 = 1, . . . , 𝑟) and zero elsewhere. In par-ticular, from equation (2.1.13) and Theorem 2.1.4 it can be seen that 𝔓𝑿 is strictly 𝑟-parametrical.
Moreover, we also remark the exponential structure for the special case of SOSs (i.e. 𝛾𝑗 = 𝛼𝑗(𝑛− 𝑗 + 1), 𝑗 = 1, . . . , 𝑟) with natural parameter 𝜶 = (𝛼1, . . . , 𝛼𝑟), since models of SOSs play an important role in Chapter 3. The representation follows immediately from the one of GOSs
𝑓𝑋∗1,...,𝑋∗𝑟 𝜶 (𝑥1, . . . , 𝑥𝑟) = ( 𝑟 ∏ 𝑗=1 𝛼𝑗(𝑛− 𝑗 + 1) ) (𝑟−1 ∏ 𝑗=1 (1− 𝐹 (𝑥𝑗))𝛼𝑗(𝑛−𝑗+1)−𝛼𝑗+1(𝑛−𝑗)−1𝑓 (𝑥𝑗) ) × (1 − 𝐹 (𝑥𝑟))𝛼𝑟(𝑛−𝑟+1)−1𝑓 (𝑥𝑟) = ( 𝑟 ∏ 𝑗=1 𝛼𝑗 ) exp ( 𝑟 ∑ 𝑗=1 𝛼𝑗(𝑛− 𝑗 + 1) ln( 1 − 𝐹 (𝑥𝑗 ) 1− 𝐹 (𝑥𝑗−1) )) × ( 𝑛! (𝑛− 𝑟)! 𝑟 ∏ 𝑗=1 𝑓 (𝑥𝑗) 1− 𝐹 (𝑥𝑗) ) .
2.1.3. Explicit Forms of Divergences for GOSs
In the previous two subsections, fundamentals are established to obtain all considered divergences for distributions of GOSs based on the same absolutely continuous distri-bution function 𝐹 (with corresponding density 𝑓 ) and on possibly different parameters 𝜸, 𝝉 ∈ ℝ𝑟
+. With corresponding densities 𝑓𝜸𝑿 and 𝑓𝝉𝑿 according to (2.1.9) we find the following result.
Proposition 2.1.9 (Divergences for GOSs)
Given the unique parameters 𝜸 and 𝝉 of two joint (marginal) distributions of the first 𝑟∈ {1, . . . , 𝑛} GOSs the divergence between them can be computed as follows.
Kullback-Leibler divergence 𝐷𝐾𝐿(𝜸, 𝝉 ) = 𝑟 ∑ 𝑗=1 ( 𝜏𝑗 𝛾𝑗 + ln ( 𝛾𝑗 𝜏𝑗 ) − 1 ) Jeffreys J-distance 𝐷𝐽(𝜸, 𝝉 ) = 𝑟 ∑ 𝑗=1 ( 𝜏𝑗 𝛾𝑗 + 𝛾𝑗 𝜏𝑗 − 2 ) = 𝑟 ∑ 𝑗=1 (𝜏𝑗− 𝛾𝑗)2 𝜏𝑗𝛾𝑗 R´enyi divergence for 𝛼∈ (0, 1)
𝐷𝑅,𝛼(𝜸, 𝝉 ) = 1 𝛼(𝛼− 1)ln ( 𝑟 ∏ 𝑗=1 𝛾𝛼 𝑗𝜏𝑗1−𝛼 𝛼𝛾𝑗 + (1− 𝛼)𝜏𝑗 ) Bhattacharyya distance 𝐷𝐵(𝜸, 𝝉 ) =− ln ( 𝑟 ∏ 𝑗=1 2√𝛾𝑗𝜏𝑗 𝛾𝑗+ 𝜏𝑗 )
Hellinger distance for 𝑚∈ 2ℕ
𝐷𝐻,𝑚(𝜸, 𝝉 ) = ⎛ ⎝ 𝑚 ∑ 𝑘=0 (−1)𝑘 ( 𝑚 𝑘 )∏𝑟 𝑗=1 ⎛ ⎝ 𝛾 𝑘 𝑚 𝑗 𝜏 𝑚−𝑘 𝑚 𝑗 𝑘 𝑚𝛾𝑗+ 𝑚−𝑘 𝑚 𝜏𝑗 ⎞ ⎠ ⎞ ⎠ 1 𝑚
In particular: Hellinger distance for 𝑚 = 2 𝐷𝐻,2(𝜸, 𝝉 ) = ( 2− 2 ( 𝑟 ∏ 𝑗=1 2√𝛾𝑗𝜏𝑗 𝛾𝑗+ 𝜏𝑗 ))1 2
The expressions in this proposition can be viewed as respective divergences and dis-tances for uniform generalized order statistics, since the baseline distribution is not involved in the explicit forms. We will return to this property in Section 2.2.
Proof. The equations are a conclusion from Corollary 2.1.8 and the presentations of 𝜅 and 𝜋 for GOSs according to (2.1.11) and (2.1.12), respectively.
For the Kullback-Leibler divergence we have
𝐷𝐾𝐿(𝜸, 𝝉 ) = 𝜅(𝝉 )− 𝜅(𝜸) + ((𝜸 − 𝝉 )′𝜋(𝜸)) =− 𝑟 ∑ 𝑗=1 ln(𝜏𝑗) + 𝑟 ∑ 𝑗=1 ln(𝛾𝑗) + ( (𝜸− 𝝉 )′(−1 𝛾1, . . . ,− 1 𝛾𝑟) ′ ) = 𝑟 ∑ 𝑗=1 ln ( 𝛾𝑗 𝜏𝑗 ) − ( (𝜸 − 𝝉 )′( 1 𝛾1, . . . , 1 𝛾𝑟) ′ ) = 𝑟 ∑ 𝑗=1 ( ln ( 𝛾𝑗 𝜏𝑗 ) − 1 + 𝜏𝑗 𝛾𝑗 ) = 𝑟 ∑ 𝑗=1 ( 𝜏𝑗 𝛾𝑗 + ln ( 𝛾𝑗 𝜏𝑗 ) − 1 ) . Jeffreys distance follows immediately by
𝐷𝐽(𝜸, 𝝉 ) = 𝐷𝐾𝐿(𝜸, 𝝉 ) + 𝐷𝐾𝐿(𝝉 , 𝜸) = 𝑟 ∑ 𝑗=1 ( 𝜏𝑗 𝛾𝑗 + 𝛾𝑗 𝜏𝑗 + ln ( 𝜏𝑗 𝛾𝑗 ) + ln ( 𝛾𝑗 𝜏𝑗 ) | {z } =0 −2 ) .
The R´enyi divergence of order 𝛼∈ (0, 1) in case of GOSs is 𝐷𝑅,𝛼(𝜸, 𝝉 ) = 1 𝛼(𝛼− 1)[𝜅(𝛼𝜸 + (1− 𝛼)𝝉 ) − (𝛼𝜅(𝜸) + (1 − 𝛼)𝜅(𝝉 ))] = 1 𝛼(𝛼− 1) [ − 𝑟 ∑ 𝑗=1 ln(𝛼𝛾𝑗 + (1− 𝛼)𝜏𝑗) + 𝑟 ∑ 𝑗=1 (𝛼 ln(𝛾𝑗) + (1− 𝛼) ln(𝜏𝑗)) ] = 1 𝛼(𝛼− 1)ln ( 𝑟 ∏ 𝑗=1 𝛾𝛼 𝑗𝜏𝑗1−𝛼 𝛼𝛾𝑗 + (1− 𝛼)𝜏𝑗 ) .
Since for 𝛼 = 12 = 1− 𝛼 this formula simplifies, the Bhattacharyya distance is 𝐷𝐵(𝜸, 𝝉 )(1.2.5)= 1 4𝐷𝑅,12(𝜸, 𝝉 ) =− ln ( 𝑟 ∏ 𝑗=1 2√𝛾𝑗𝜏𝑗 𝛾𝑗 + 𝜏𝑗 )
and as transformation of the latter, the Hellinger distance (𝑚 = 2) is
𝐷𝐻,2(𝜸, 𝝉 ) (1.2.6) = (2− 2 exp(−𝐷𝐵(𝜸, 𝝉 )))12 = ( 2− 2 𝑟 ∏ 𝑗=1 2√𝛾𝑗𝜏𝑗 𝛾𝑗 + 𝜏𝑗 )1 2 .
The Hellinger distance for 𝑚 ∈ 2ℕ is achieved with an explicit expression for 𝜆𝑗
𝑚,
𝑚−𝑗
𝑚 .
It can be computed analogously to the computation of the product for the R´enyi diver-gence, since 𝐷𝑅,𝛼(𝜸, 𝝉 ) = 𝛼(𝛼−1)1 ln (𝜆𝛼,1−𝛼(𝑓𝜸, 𝑓𝝉)) .
Note that Bhattacharyya distance and Hellinger metric are transformations of a sim-ilarity coefficient which is given by the product of the componentwise ratios of the
geometric mean and the arithmetic mean. These ratios are always larger than zero and not larger than 1 (inequality of arithmetic and geometric means) and so is the product of them.
We consider a first example for the J-distance.
Example 2.1.10. Let 𝑟 ∈ ℕ, 𝜸 = (𝑛, 𝑛 − 1, 𝑛 − 2, . . . , 𝑛 − 𝑟 + 1)′ (i.e. model parameter for first 𝑟 OSs based on a sample of size 𝑛∈ ℕ), and 𝜏(𝑘) = (𝑛 + 𝑘, (𝑛 + 𝑘)− 1, (𝑛 + 𝑘) − 2, . . . , (𝑛 + 𝑘)− 𝑟 + 1)′, 𝑘 ∈ ℕ (i.e. model parameter for first 𝑟 OSs based on a sample of size 𝑛 + 𝑘∈ ℕ). Then we find 𝐷𝐽(𝜸, 𝝉(𝑘)) = 𝑟 ∑ 𝑗=1 ( 𝑛− 𝑗 + 1 𝑛− 𝑗 + 1 + 𝑘 + 𝑛− 𝑗 + 1 + 𝑘 𝑛− 𝑗 + 1 − 2 ) = 𝑟 ∑ 𝑗=1 ( 1− 𝑘 𝑛− 𝑗 + 1 + 𝑘 + 1 + 𝑘 𝑛− 𝑗 + 1− 2 ) = 𝑟 ∑ 𝑗=1 ( − 𝑘 𝑛− 𝑗 + 1 + 𝑘 + 𝑘 𝑛− 𝑗 + 1 ) = 𝑘 ( 𝑛+𝑘 ∑ 𝑗=𝑛−𝑟+1+𝑘 −1 𝑗 + 𝑛 ∑ 𝑗=𝑛−𝑟+1 1 𝑗 ) = 𝑘 𝑛 ∑ 𝑗=𝑛−𝑟+1 ( 1 𝑗 − 1 𝑗 + 𝑘 ) .
This distance tends to zero as 𝑛 tends to infinity if 𝑘 and 𝑟 are fixed. It tends to infinity as 𝑘 tends to infinity with 𝑛 and 𝑟 fixed. As special case we note
𝐷𝐽(𝜸, 𝝉(1)) = 1 𝑛− 𝑟 + 1 − 1 𝑛 + 1 = 𝑟 (𝑛− 𝑟 + 1)(𝑛 + 1).
As mentioned before, the J-distance fails to be a metric in general because it does not satisfy the triangle equality. Even for the class of densities given by the concept of GOSs, the J-distance still does not have this metric property.
Remark 2.1.11. Let 𝜸, 𝝉 ∈ ℝ𝑟
+. It can be shown that for every 𝜹∈ ℝ𝑟+ with 𝛿𝑗 ∈ (min{𝛾𝑗, 𝜏𝑗}, max{𝛾𝑗, 𝜏𝑗})
the inequality
𝐷𝐽(𝜸, 𝝉 ) > 𝐷𝐽(𝜸, 𝜹) + 𝐷𝐽(𝜹, 𝝉 )
holds (see Section A.1 in the appendix). That is, the triangle inequality is not satisfied. Note that the condition given here is only a sufficient one. There are more examples for which the triangle equality is not valid.
2.2. On Structure of and Relations between Models of GOSs w.r.t.
to Divergences
In this section, some properties of chosen divergences given in Proposition 2.1.9 are dis-cussed. Throughout this work, divergences between distributions of first 𝑟∈ {1, . . . , 𝑛} ordered random quantities are interpreted as divergences between the corresponding models itself, although strictly speaking, only marginal distributions are compared. The interpretation as distance between two models seems to be justified by the fact that the specific joint distributions of 𝑟 ordered random quantities are directly connected to the corresponding models. For the described purposes and a simple notation, the abbre-viations 𝑂𝑆𝛼0, 𝑆𝑂𝑆𝜶, 𝑅𝑉𝛽0, 𝑃 𝑅𝑉𝜷, and 𝑃 𝐶𝑹, which are already given in Table 1.1, are used as arguments for the divergences for a fixed 𝑟 in the following. This has to be understood in the following way as it is explained with an example of progressively Type-II censored order statistics: 𝑃 𝐶𝑹 is a short notation for the parameter vector which corresponds to a progressively Type-II censored sample of size 𝑟 from 𝑁 units with censoring scheme 𝑹 = (𝑅1, . . . , 𝑅𝑟), that is
(𝑁, 𝑁− 1 − 𝑅1, . . . , 𝑁 − 𝑟 + 1 − 𝑟−1 ∑ 𝑖=1 𝑅𝑖)′ = (𝑟 + 𝑟 ∑ 𝑖=1 𝑅𝑖, 𝑟− 1 + 𝑟 ∑ 𝑖=2 𝑅𝑖, . . . , 1 + 𝑅𝑟)′, where 𝑟 +∑𝑟𝑖=1𝑅𝑖 = 𝑁 . Analogously, for a given 𝑛∈ ℕ, 𝑆𝑂𝑆𝜶with 𝜶 = (𝛼1, . . . , 𝛼𝑟)′ ∈ ℝ𝑟
+ denotes the parameter vector
(𝛼1𝑛, 𝛼2(𝑛− 1), . . . , 𝛼𝑟(𝑛− 𝑟 + 1))′,
which yields the distribution of the first 𝑟 SOSs based on 𝛼1, . . . , 𝛼𝑟. Upon introduc-ing such notations, we give an example that provides some alternative expressions for divergences and distances between models of OSs and RVs. The corresponding model parameters are only dependent on 𝑟 (RVs) or 𝑟 and 𝑛 (OSs).
Example 2.2.1. For the Kullback-Leibler divergence we find 𝐷𝐾𝐿(𝑅𝑉, 𝑂𝑆) = 𝑟 ∑ 𝑗=1 ( 𝑛− 𝑗 + 1 + ln ( 1 𝑛− 𝑗 + 1 ) − 1 ) = 𝑟 2(2𝑛− 𝑟 − 1) − ln ( 𝑛! (𝑛− 𝑟)! ) and 𝐷𝐾𝐿(𝑂𝑆, 𝑅𝑉 ) = 𝑟 ∑ 𝑗=1 ( 1 𝑛− 𝑗 + 1 + ln (𝑛− 𝑗 + 1) − 1 ) = 𝑛 ∑ 𝑗=1 ( 1 𝑗 ) − 𝑛−𝑟 ∑ 𝑗=1 ( 1 𝑗 ) + ln ( 𝑛! (𝑛− 𝑟)! ) − 𝑟 = 𝜓(𝑛 + 1)− 𝜓(𝑛 − 𝑟 + 1) + ln ( 𝑛! (𝑛− 𝑟)! ) − 𝑟,
where 𝛾 is the Euler constant and 𝜓 is the digamma function. The latter is given by 𝜓(𝑥) = 𝑑/𝑑𝑥 ln Γ(𝑥), and the equality 𝜓(𝑛) + 𝛾 = ∑𝑛𝑘=1−1𝑘−1, 𝑛 ∈ ℕ, holds (see, e.g., Abramowitz and Stegun, 1964, p. 258). Moreover, we derive the distances
𝐷𝐽(𝑂𝑆, 𝑅𝑉 ) = 𝐷𝐾𝐿(𝑂𝑆, 𝑅𝑉 ) + 𝐷𝐾𝐿(𝑅𝑉, 𝑂𝑆) = 𝜓(𝑛 + 1)− 𝜓(𝑛 − 𝑟 + 1) + 𝑟 2(2𝑛− 𝑟 − 3) and 𝐷𝐻,2(𝑂𝑆, 𝑅𝑉 ) = ( 2− 2 𝑟 ∏ 𝑗=1 2√𝑛− 𝑗 + 1 𝑛− 𝑗 + 2 )1 2 = ( 2− 2𝑟+1 ( 𝑛! (𝑛− 𝑟)! )1 2 (𝑛− 𝑟 + 1)! (𝑛 + 1)! )1 2 = ( 2− 2𝑟+1 ( (𝑛− 𝑟)! 𝑛! )1 2 𝑛− 𝑟 + 1 𝑛 + 1 )1 2 .
Tables C.1 to C.4 (on pp. 138 ff.) in the appendix contain computed values of these derived divergences and distances between OSs and RVs. In particular, there are many values for large 𝑟 and 𝑛 of the Hellinger distance which are close to√2 (the upper bound of 𝐷𝐻,2). This may be understood as a disadvantage of the Hellinger distance, which is the only considered divergence with metric properties in this work. The very similar values of the Hellinger distance cannot reveal relative differences of distances between models as detailed as the values of unbounded measures can.
For Jeffreys and Hellinger distances, this example of OSs and RVs is given in a preprint of Vuong et al. (2012) along with some other special cases which yield explicit alternative expressions.
2.2.1. First Results
No dependence on baseline distribution
The first interesting common property of the divergences between GOSs is that they are independent of the baseline cumulative distribution function 𝐹 . Meaning, the diver-gences between two models of GOSs are invariant under particular choices of a common baseline distribution function. This is due to the fact that 𝐹 is not involved in 𝜅. For example, the divergence between record values and ordinary order statistics is always the same for fixed 𝑟 and 𝑛, regardless on which absolutely continuous distribution func-tion they are based if it is the same for both.
Cramer and Bagh (2011) used the Kullback-Leibler divergence as an information mea-sure (see Kullback and Leibler, 1951; Kullback, 1959) to consider optimal censoring schemes for progressively Type-II censored order statistics. They compare progressively Type-II censored OSs with an iid sample of same size. By this, they can establish optimal censoring schemes in the sense of minimum or maximum information for any continuous distribution function because the divergence is distribution free in this case,
too. Moreover, Cramer and Bagh considered an 𝐼𝛼-information, which is closely related to R´enyi divergence as it is considered in this work.
Dependence on parameter ratios
Another property directly to be seen from the formulas in Proposition 2.1.9 is that all measures may be rewritten as functions of the ratios 𝛾𝑗/𝜏𝑗 or of its reciprocals. For Kullback-Leibler divergence and Jeffreys J-distance this is obvious from the expressions given in Proposition 2.1.9. The equation
𝑟 ∏ 𝑗=1 𝛾𝑗𝑝𝜏𝑗𝑞 𝑝𝛾𝑗 + 𝑞𝜏𝑗 = 𝑟 ∏ 𝑗=1 (𝛾𝑗/𝜏𝑗)𝑝 𝑝(𝛾𝑗/𝜏𝑗) + 𝑞
for 𝑝, 𝑞 ∈ (0, 1) with 𝑝+𝑞 = 1 reveals the dependence on the parameter ratios of the other considered measures. Consequently, with respect to the 𝑗th component (𝑗 ∈ {1, . . . , 𝑟}) of two parameter vectors factors will cancel out each other for every considered di-vergence measure. An example occurs in case of two models of SOSs. For the 𝑗th component of the parameter vectors the factor (𝑛−𝑗 +1) is canceled, since the fractions
˜ 𝛼𝑗(𝑛− 𝑗 + 1) 𝛼𝑗(𝑛− 𝑗 + 1) = ˜ 𝛼𝑗 𝛼𝑗, 𝑗 = 1, . . . , 𝑟,
will lead to equal divergences. Note that this yields a kind of independence of the diver-gences of the number 𝑛 of total random variables, similar to the distribution freedom mentioned above. For a comparison of two models for the first 𝑟 (≤ 𝑛) SOSs determined by their model parameter vectors 𝜶,𝜶, the crucial item to determine the divergence is˜ the vector of parameter quotients (𝛼1/𝛼˜1, . . . , 𝛼𝑟/𝛼˜𝑟)′ ∈ ℝ𝑟+.
Table 2.1 illustrates some distances between representatives of different sets of models 𝑂𝑆𝛼0, 𝑆𝑂𝑆𝜶, 𝑅𝑉𝛽0, 𝑃 𝑅𝑉𝜷, 𝑃 𝐶𝑹. Jeffreys distance and (squared) Hellinger distance for 𝑚 = 2 are specified, but since the main product (Bhattacharyya coefficient) in the Hellinger distance is also the crucial term in the Bhattacharyya distance, it can also be seen as a table for the latter. By comparing the values in Table 2.1, it is noticeable that the entries in the block of 𝑂𝑆𝛼0/𝑆𝑂𝑆𝜶 are equally structured as the ones in the block concerning 𝑅𝑉𝛽0/𝑃 𝑅𝑉𝜷. This similarity is a consequence of the exclusive dependence on the ratios and it is illustrated in Figure 2.1.
A closer look at the two well-known models OSs and RVs provides further insight to the mentioned property. These two models are also interesting for a comparison because they do not have any further parameters (cf. Example 2.2.1). Meaning, they can be viewed as 𝑆𝑂𝑆𝜶 models with fixed choices of 𝜶.
Example 2.2.2 (distance scheme around record values). Let 𝑟 and 𝑛 be fixed, and let 𝐷★ denote any divergence measure from Proposition 2.1.9. Then 𝑑 = 𝐷★(𝑅𝑉, 𝑂𝑆) > 0 is a fixed number. Hence, a question may arise: Which choices for the parameter 𝜶 provide
𝐷★(𝑅𝑉, 𝑆𝑂𝑆𝜶) = 𝑑 ? (2.2.1)
Obviously, 𝜶 = (1, . . . , 1)′ =: 1∈ ℝ𝑟 is a possible choice, since 𝑆𝑂𝑆
1=𝑂𝑆, but it can be seen that there are infinitely many different possible choices for 𝜶 (for 𝑟≥ 2). Recursive
𝑂 𝑆𝛼 0 𝑆 𝑂 𝑆𝜶 𝑅 𝑉𝛽 0 𝑃 𝑅 𝑉𝜷 𝑃 𝐶𝑹 M o d el 𝑟 ( 𝛼 0 ˜𝛼0 + ˜𝛼0 𝛼 0 − 2 ) ∑ ( 𝛼 𝑗 ˜𝛼0 + ˜𝛼0 𝛼𝑗 − 2 ) ∑ ( 𝛽0 ˜𝛼0 𝑘𝑗 + ˜𝛼0 𝑘𝑗 𝛽0 − 2 ) ∑ ( 𝛽𝑗 ˜𝛼0 𝑘𝑗 + ˜𝛼0 𝑘𝑗 𝛽𝑗 − 2 ) ∑ ( 𝑘 𝑗 + 𝑹 𝑟 𝑗 ˜𝛼0 𝑘𝑗 + ˜𝛼0 𝑘𝑗 𝑘𝑗 + 𝑹 𝑟 𝑗 − 2 ) 𝑂 𝑆˜𝛼 0 J-distance 𝐷𝐽(⋅, ⋅) squ are dH ell in ger -d ist an ce 𝐷 𝐻 (⋅, ⋅) 2 𝑂 𝑆˜𝛼 0 2 − 2 𝑟 +1 (√ ˜𝛼0 𝛼 0 ˜𝛼0 + 𝛼 0 )𝑟 ∑ ( 𝛼 𝑗 ˜𝛼𝑗 + ˜𝛼𝑗 𝛼𝑗 − 2 ) ∑ ( 𝛽0 ˜𝛼𝑗 𝑘𝑗 + ˜𝛼𝑗 𝑘𝑗 𝛽0 − 2 ) ∑ ( 𝛽𝑗 ˜𝛼𝑗 𝑘𝑗 + ˜𝛼𝑗 𝑘𝑗 𝛽𝑗 − 2 ) ∑ ( 𝑘 𝑗 + 𝑹 𝑟 𝑗 ˜𝛼𝑗 𝑘𝑗 + ˜𝛼𝑗 𝑘𝑗 𝑘𝑗 + 𝑹 𝑟 𝑗 − 2 ) 𝑆 𝑂 𝑆˜𝜶 𝑆 𝑂 𝑆˜𝜶 2 − 2 ∏ 2 √ ˜𝛼 𝑗 𝛼 0 ˜𝛼𝑗 + 𝛼 0 2 − 2 ∏ 2 √ ˜𝛼 𝑗 𝛼𝑗 ˜𝛼𝑗 + 𝛼𝑗 𝑟 ( 𝛽 0 ˜ 𝛽0 + ˜ 𝛽0 𝛽0 − 2 ) ∑ ( 𝛽 𝑗 ˜ 𝛽0 + ˜ 𝛽0 𝛽𝑗 − 2 ) ∑ ( 𝑘 𝑗 + 𝑹 𝑟 𝑗 ˜ 𝛽0 + ˜ 𝛽0 𝑘𝑗 + 𝑹 𝑟 𝑗 − 2 ) 𝑅 𝑉˜ 𝛽 0 𝑅 𝑉˜ 𝛽 0 2 − 2 ∏ 2 √ ˜ 𝛽 0 𝛼 0 𝑘𝑗 ˜ 𝛽0 + 𝛼 0 𝑘𝑗 2 − 2 ∏ 2 √ ˜ 𝛽 0 𝛼𝑗 𝑘𝑗 ˜ 𝛽0 + 𝛼𝑗 𝑘𝑗 2 − 2 𝑟 +1 (√ ˜ 𝛽0 𝛽0 ˜ 𝛽0 + 𝛽0 )𝑟 ∑ ( 𝛽 𝑗 ˜ 𝛽𝑗 + ˜ 𝛽𝑗 𝛽𝑗 − 2 ) ∑ ( 𝑘 𝑗 + 𝑹 𝑟 𝑗 ˜ 𝛽𝑗 + ˜ 𝛽𝑗 𝑘𝑗 + 𝑹 𝑟 𝑗 − 2 ) 𝑃 𝑅 𝑉˜ 𝜷 𝑃 𝑅 𝑉˜ 𝜷 2 − 2 ∏ 2 √ ˜ 𝛽 𝑗 𝛼 0 𝑘𝑗 ˜ 𝛽𝑗 + 𝛼0 𝑘𝑗 2 − 2 ∏ 2 √ ˜ 𝛽 𝑗 𝛼𝑗 𝑘𝑗 ˜ 𝛽𝑗 + 𝛼𝑗 𝑘𝑗 2 − 2 ∏ 2 √ ˜ 𝛽 𝑗 𝛽0 ˜ 𝛽𝑗 + 𝛽0 2 − 2 ∏ 2 √ ˜ 𝛽 𝑗 𝛽𝑗 ˜ 𝛽𝑗 + 𝛽𝑗 ∑ ( 𝑘𝑗 + 𝑹 𝑟 𝑗 𝑘𝑗 + ˜ 𝑹 𝑟 𝑗 + 𝑘𝑗 + ˜ 𝑹 𝑟 𝑗 𝑘𝑗 + 𝑹 𝑟 𝑗 − 2 ) 𝑃 𝐶˜ 𝑹 𝑃 𝐶˜ 𝑹 2 − 2 ∏ 2 √ ( 𝑘𝑗 + ˜ 𝑹 𝑟)𝑗 𝛼 0 𝑘𝑗 ˜ 𝑹 𝑟+(𝑗 𝛼 0 +1 ) 𝑘𝑗 2 − 2 ∏ 2 √ ( 𝑘𝑗 + ˜ 𝑹 𝑟)𝑗 𝑘𝑗 𝛼𝑗 ˜ 𝑹 𝑟+(𝑗 𝛼𝑗 +1 ) 𝑘𝑗 2 − 2 ∏ 2 √ ( 𝑘𝑗 + ˜ 𝑹 𝑟)𝑗 𝛽0 𝑘𝑗 + ˜ 𝑹 𝑟+𝑗 𝛽0 2 − 2 ∏ 2 √ ( 𝑘𝑗 + ˜ 𝑹 𝑟)𝑗 𝛽𝑗 𝑘𝑗 + ˜ 𝑹 𝑟+𝑗 𝛽𝑗 2 − 2 ∏ 2 √ ( 𝑘𝑗 + ˜ 𝑹 𝑟)(𝑗 𝑘𝑗 + 𝑹 𝑟)𝑗 2 𝑘𝑗 + ˜ 𝑹 𝑟+𝑗 𝑹 𝑟 𝑗 M o d el 𝑂 𝑆𝛼 0 𝑆 𝑂 𝑆𝜶 𝑅 𝑉𝛽 0 𝑃 𝑅 𝑉𝜷 𝑃 𝐶𝑹 T ab le 2. 1. : J -d is ta n ce (u p p er ri gh t en tr ie s) an d sq u ar ed H el li n ge r d is ta n ce (l ow er le ft en tr ie s) fo r d iff er en t ch os en m o d el s in cl u d ed in G O S s. S h or t n ot at io n s ’ ∑ 𝑎𝑗 ’ st an d s fo r th e su m ∑ 𝑟 𝑗= 1 𝑎𝑗 , an d an al og ou sl y ’ ∏ 𝑎𝑗 ’ fo r th e p ro d u ct ∏𝑟 𝑗= 1 𝑎𝑗 . F u rt h er , w e se t 𝑹 𝑟 𝑗:= ∑ 𝑟 𝑖= 𝑗 𝑅𝑖 an d 𝑘𝑗 := 𝑛 − 𝑗 + 1 fo r th is ta b le .
𝑅𝑉 𝑂𝑆 𝑆𝑂𝑆𝜶 𝑃 𝑅𝑉𝜷 𝑆𝑂𝑆𝜷 𝑃 𝑅𝑉𝜶 @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH HH 𝑑′ = 𝐷★(𝑃 𝑅𝑉𝜶, 𝑃 𝑅𝑉𝜷) 𝑑′′= 𝐷★(𝑅𝑉, 𝑃 𝑅𝑉𝜶) 𝑑 = 𝐷★(𝑅𝑉, 𝑃 𝑅𝑉𝜷) 𝑑′ 𝑑′′ 𝑑
Figure 2.1.: Equidistant models of (Pfeifer) record values and (sequential) order statistics with respect to divergences for arbitrary parameters 𝜶, 𝜷∈ ℝ𝑟
+ illustrated by the num-ber of connecting lines.
equations are obtainable despite the unavailability of closed form solutions of equation (2.2.1). See also Subsection 2.2.3 on this topic. For clarity, let 𝜶∗ denote such a solution of equation (2.2.1). Another similar query arises: Which choices for the parameter 𝜷 yield
𝐷★(𝑅𝑉, 𝑃 𝑅𝑉𝜷) = 𝑑 ? (2.2.2)
Let 𝜷∗ denote a solution for this equation. Because of the mentioned property of direct dependence on the parameter ratios, it is immediate that
𝐷★(𝑂𝑆, 𝑆𝑂𝑆𝜷∗) = 𝑑 (follows from def. of 𝜷∗, since 𝑛− 𝑗 + 1 𝛽∗ 𝑗(𝑛− 𝑗 + 1) = 1 𝛽∗ 𝑗 ), 𝐷★(𝑃 𝑅𝑉𝜶∗, 𝑆𝑂𝑆𝜶∗) = 𝑑 (follows from def. of 𝑑, since
𝛼∗𝑗 𝛼∗
𝑗(𝑛− 𝑗 + 1)
= 1
𝑛− 𝑗 + 1), 𝐷★(𝑃 𝑅𝑉𝜷∗, 𝑆𝑂𝑆𝜷∗) = 𝑑 (follows from def. of 𝑑, since
𝛽∗ 𝑗 𝛽∗ 𝑗(𝑛− 𝑗 + 1) = 1 𝑛− 𝑗 + 1). It is worth mentioning that we have
𝐷★(𝑅𝑉, 𝑆𝑂𝑆) = 𝐷★(𝑃 𝑅𝑉𝜶, 𝑆𝑂𝑆𝜶) = 𝐷★(𝑃 𝑅𝑉𝜷, 𝑆𝑂𝑆𝜷) for all 𝜶, 𝜷∈ ℝ𝑟