3.5 River flow data
4.2.4 Method 5: weighted extension of the regular variation approach
approach
We adapt Method 3 to include the weights δC,j by changing the estimation of the
components in the right hand side of equation (4.2.2), for each C ∈ 2D \ ∅. Firstly,
the probability Pr(W ∈ BC) is estimated by 1n
Pn
j=1δC,j, i.e., the average weight
associated with face BC. Recall that under assumption (4.2.3), replacing the slowly
varying function by a constant, for each face we have a model of the form
Pr(R > r |W ∈BC) =JCr−1/κC,
for r greater than some high thresholduC, κC ∈(0,1] and JC >0. Instead of using
the partitioning approach of Method 3, this model can be fitted by adapting the censored likelihood approach. In particular, for each C ∈2D\ ∅, we replace 1{rj>uC} byδC,j1{rj>uC}, and 1{rj≤uC} byδC,j1{rj≤uC}, to yield the function
n Y j=1 1−JCu −1/κC C δC,j1{rj≤uC}JC κC r−1−1/κC j δC,j1{rj >uC} ,
which should be maximized. This is no longer a true likelihood in the sense of corre- sponding to an underlying statistical model, yet the weights have clear roles in mod- ifying the estimators. The estimators that maximize this modified objective function are ˆ κC = n X j=1 δC,j1{rj>uC} !−1 n X j=1 δC,j1{rj>uC}log rj uC and ˆ JC = Pn j=1δC,j1{rj>uC} Pn j=1δC,j ! u1/ˆκC C .
In practice, we take the estimate of κC to be min(ˆκC,1), with a suitable update
to ˆJC, as previously. Having determined which weights to use, and computed ˆκC and
ˆ
JC, the remainder of the approach is analogous to Method 3. Moreover, Method 3
is a special case of the weighted approach, with δC,j = 1 if wj ∈ BC, the region of
the partition that approximatesBC, and δC,j = 0 otherwise, just as the approach of
Goix et al. (2016) is a special case of Method 4. In Method 3, we only carry out the estimation procedure on a face if there are at least m angular observations in the corresponding region of the partitioned simplex. The natural extension of this condition to the weighted approach is to have Pn
j=1δC,j being at least m; we again take the default choicem= 1.
4.2.5
A proposed weighting
We propose a weighting which takes into account the proximity of a point w = (w1, . . . , wd) to various faces of the unit simplex. We begin by considering the vertex
to which the point w is closest. This can be determined by finding S(1) ={i : w
i =
max(w1, . . . , wd)}. Then, for some k > 0, the weight assigned to the region BS(1),
corresponding to the face where only variable {Xi :i∈S(1)} is large, is
δS(1) = X i∈S(1) wi k =wkS(1).
Fors= 2, . . . , d, we then find the setS(s) with|S(s)|=ssuch that wlies closer to the regionBS(s) than any other set BS(s)0 with |S(s)
0
|=s. For this set, the associated weight is defined as δS(s) = X i∈S(s) wi k − X i∈S(s−1) wi k .
For all remaining sets of size s, the weight is set to zero. Note that for s = d the only choice is S(d) = D ={1, . . . , d}. In Appendix B.1, we verify that this choice of weights satisfies properties 1 and 2 from Section 4.2.2.
In practice, the data are transformed to Fr´echet margins using the rank transform; this can lead to components ofw being equal. We break such ties randomly, so that each point assigns non-zero weight to exactly d faces, one of each dimension. An al- ternative would be to divide the weight across faces of the same dimension which are equidistant from the pointw. The latter approach is more computationally complex, and is unlikely to significantly improve results.
Figure 4.2.3: An example of our proposed weighting in the trivariate case, withk = 10. The grey regions in the first two plots show where the weights are exactly 0.
Figure 4.2.3 shows an example of this weighting in the trivariate case. The first diagram illustrates the weight contributed by each point to the face where only vari- able X3 is large, based on its position in the simplex. The central diagram shows the weight assigned to the face with variables (X2, X3) being large simultaneously. In both these cases, illustrations of the weights assigned to the other faces of the same dimension are obtained by rotation. The final diagram shows the weights associated with the centre of the simplex, where all three variables are simultaneously large. Increasing the value of k assigns more weight to the interior of the simplex, while smaller values ofk assign greater weight to the vertices.
4.3
Simulation study
4.3.1
Max-mixture distribution
In this section, we assess the efficacy of Methods 3, 4 and 5 by considering the results of simulations based on the five-dimensional max-mixture distribution introduced in Section 3.4.2 of Chapter 3. We consider the asymmetric logistic model in Section 4.3.2. Once again, our metrics are the Hellinger distance; the area under the receiver oper- ating characteristic curve (now denoted AUC); and the area under the neighboured receiver operating characteristic curve (now denoted AUC*). This final metric was introduced in Chapter 3, and allows us to assess whether an estimated extremal de- pendence structure is close to the truth by taking into account whether or not faces adjacent to those truly having mass are detected.
The max-mixture model consists of two bivariate Gaussian copulas with correla- tion parameter ρ, as well as two trivariate and one five-dimensional logistic copula with dependence parameter α. In Method 3, we set = 0.05, each threshold uC
to be the 0.975 quantile of observed radial values over all sets C ∈ 2D \ ∅ and r in Pr(W ∈ BC | R > r) to be the largest observed radial value for that particular
sample. In Methods 4 and 5, we take the parameter controlling the weighting to be
k = 25. For Method 4, the radial threshold r0 is set to the 0.995 quantile of the observed radial values. For Method 5, the thresholduC in each region is taken to be
the 0.975 quantile over all the observed radial values, and the value of r for which (4.2.2) is evaluated is taken to be the maximum of the observed radial values. For all three methods, to obtain the Hellinger distances, we take the thresholdπ below which mass on each face is deemed negligible to be 0.001, as in the simulations of Chapter 3; this parameter is varied to obtain the AUC and AUC* values. Again, these tuning parameter values are not optimized for individual experiments, but parameter stabil- ity plots as introduced in Section 3.4.3 could be used to choose the tuning parameter,
and one should ensure that a feasible combination of faces has been determined as having mass. We take samples of sizen = 10,000 in each simulation, and repeat each setting 100 times. 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0 ρ =0 α Hellinger distance 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0 ρ =0.25 α Hellinger distance 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0 ρ =0.5 α Hellinger distance 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0 ρ =0.75 α Hellinger distance
Figure 4.3.1: Mean Hellinger distance over 100 simulations for the max-mixture dis- tribution. Method 1: purple; Method 2: green; Method 3: blue; Method 4: pink; Method 5: orange; Goix et al.: grey.
In Figure 4.3.1, we present results on the average Hellinger distance achieved by Methods 3, 4 and 5, comparing them to the previous results obtained for the approach of Goix et al. and Methods 1 and 2 in Section 3.4.2; the 0.05 and 0.95 quantiles are now omitted to improve readability. These results cover a range of model parameters;
ρ ∈ {0,0.25,0.5,0.75} and α ∈ {0.1,0.2, . . . ,0.9}. Method 3, based on the regular variation assumption and a partition of the angular simplex, is generally the most successful for ρ ∈ {0,0.25,0.5}, but is outperformed by Method 2 for ρ = 0.75 and
α ≤ 0.6. The weighted approach used in Methods 4 and 5 is consistently more suc- cessful than the approach of Goix et al., and these methods also provide competitive results compared to Method 1, particularly in the case whereρ= 0.75, corresponding to stronger dependence in the asymptotically independent Gaussian components, or higher values of α, which lead to weaker asymptotic dependence in the logistic parts of the model.
ρ= 0 ρ= 0.25 ρ= 0.5 ρ= 0.75 α 0.25 0.5 0.75 0.25 0.5 0.75 0.25 0.5 0.75 0.25 0.5 0.75 Method 3 100 (0.2) 99.7 (0.9) 96.3 (2.8) 99.9 (0.3) 99.6 (0.9) 96.0 (3.0) 99.3 (1.1) 98.7 (1.6) 94.4 (3.4) 93.9 (1.7) 92.5 (2.2) 87.0 (3.6) Method 4 100 (0.0) 100 (0.1) 99.1 (1.4) 99.9 (0.2) 99.9 (0.4) 98.7 (1.6) 97.3 (1.8) 96.8 (1.9) 94.6 (2.7) 92.1 (1.4) 91.0 (2.0) 88.6 (2.6) Method 5 100 (0.2) 99.8 (0.7) 97.9 (2.3) 99.8 (0.4) 99.7 (0.8) 97.6 (2.5) 96.0 (2.0) 95.6 (2.1) 93.1 (3.2) 92.0 (1.0) 91.4 (1.4) 87.7 (2.4) ρ= 0 ρ= 0.25 ρ= 0.5 ρ= 0.75 α 0.25 0.5 0.75 0.25 0.5 0.75 0.25 0.5 0.75 0.25 0.5 0.75 Method 3 100 (0.0) 99.8 (0.7) 96.5 (2.1) 100 (0.2) 99.8 (0.7) 96.3 (2.1) 100 (0.0) 99.7 (0.8) 96.3 (2.4) 100 (0.2) 99.7 (0.8) 97.6 (2.2) Method 4 100 (0.0) 100 (0.2) 98.6 (1.6) 100 (0.0) 100 (0.2) 98.6 (1.9) 100 (0.0) 100 (0.2) 99.1 (1.9) 100 (0.0) 100 (0.0) 99.7 (0.7) Method 5 100 (0.0) 100 (0.2) 98.5 (1.4) 100 (0.0) 100 (0.2) 98.4 (1.4) 100 (0.2) 100 (0.3) 98.9 (1.1) 100 (0.3) 99.9 (0.3) 98.7 (1.3)
Table 4.3.1: Average AUC (top) and AUC* (bottom) values, given as percentages, for 100 samples from a five-dimensional max-mixture model. Results in bold show where the methods are at least as successful as any of those studied in Chapter 3.
In Table 4.3.1, we present the mean AUC and AUC* values achieved by Meth- ods 3, 4 and 5 for this max-mixture distribution, with ρ ∈ {0,0.25,0.5,0.75} and
α∈ {0.25,0.5,0.75}; boxplots of these results are provided in Appendix B.2. As was the case when comparing Hellinger distances, the results are reasonably similar for Methods 4 and 5, although the former is often a slightly more successful classifier, and is at least as successful in terms of the average AUC as all methods we consider in six out of the twelve cases. Notably, for many cases where α= 0.75, with weaker asymptotic dependence in the logistic components, the weighted approaches provide the greatest improvement over the hard-thresholding methods, although Method 2 remains the most successful classifier forρ= 0.75.
Method 4, the weighted version of Goix et al. (2016) performs at least as well as all other methods in terms of the average AUC* values, with Method 5, our other weighted approach, giving only slightly worse results. This suggests that using weights to allow observations to provide information to more than one face is a successful strategy, allowing us to estimate extremal dependence structures that are close to the truth. For this max-mixture distribution, we note that a large proportion of the faces truly have extremal mass or have a neighbouring face that does, and the AUC* metric is perhaps more informative for the asymmetric logistic distribution studied in
Section 4.3.2.