Fixed margins - Simulation study - Semiparametric Bayesian Risk Estimation for Complex Extremes

6.6 Simulation study

6.6.2 Fixed margins

In this first setup, we consider replicates of the pair (XL,YL) in Laplace margins for all four simulation processes, so that marginal features are known and we can focus on the joint features. This setup also permits comparison with the original method of Heffernan and Tawn (2004), which assumes fixed margins. These comparisons must be taken with caution, as inference and extrapolation in our approaches are performed under the simplifying but restrictive assumption of normality of the residuals. As described in Section 6.6.1, the margins are approximately on the correct scale when using Algorithm 6.2; for the three other cases, namely Gaussian, inverted logistic and logistic bivariate distributions, we transform the margins using the probability integral transform to obtain exponential margins above u, and we set u to the 98% Laplace quantile.

In each of the four cases, we sample 1,000 data sets with ˜n = 1,000 uncensored data points. We tried different combinations of the parameters, with all 54 combinations ofα = −0.6,0,0.4, β = 0,0.3,0.9, µ = −1,0,0.5 and ψ2_{= 1, 3 for the conditional model, ρ = −0.8, −0.3, 0, 0.3, 0.8 for}

the correlation of the Gaussian distribution, andγ = 0.2,0.5,0.8 for the inverted logistic and logistic dependence parameter.

Table 6.1 gives the bias and relative efficiency ofα and_b β for representative cases, namelyb (α,β,µ,ψ2_{) = (0.4,0.3,0.5,1), ρ = 0.3, and γ = 0.5, where the relative efficiency is the ratio of}

6.6 Simulation study 109

variance ×1000 bias ×1000 rel. efficiency

avg spl std avg spl std avg spl

conditional α 5.3 7.4 7.3 22.1 3.3 3.9 0.9 1.0 β 16.1 19.3 13.6 −57.8 0.7 −54.1 1.1 1.1 Gaussian α 12.8 6.1 18.9 14.3 −57.0 −95.1 0.7 0.6 β 21.0 19.0 28.5 −10.5 −71.1 −250.7 0.5 0.5 inverted logistic α 17.6 17.6 5.7 341.1 200.7 170.1 2.0 1.3 β 13.9 15.1 21.8 82.2 94.6 −138.9 0.7 0.8 logistic α 2.5 2.0 1.6 −9.2 −11.7 −22.7 1.1 1.0 β 19.2 16.6 12.1 241.7 191.6 155.1 1.5 1.2

Table 6.1 – Bias ×1000, variance ×1000 and relative efficiency forα andb β, with the average (avg) and splitb (spl) approach compared to the standard (std) approach of Heffernan and Tawn (2004). For the relative efficiency, values smaller than 1 indicate a better performance of one of our approaches. From top to bottom: conditional tail model with rejection, with (α,β,µ,ψ2_{) = (0.4,0.3,0.5,1); Gaussian bivariate} distribution with Laplace margins and correlationρ = 0.3; inverted logistic bivariate distribution

with Laplace margins and dependence parameterγ = 0.5; logistic bivariate distribution with Laplace

margins and dependence parameterγ = 0.5.

root mean squared errors (RMSE) derived from one of our approaches in the numerator, and the approach of Heffernan and Tawn (2004) in the denominator. Values smaller than 1 indicate a better performance of one of our approaches, but values larger than 1 may only indicate a bias due to assuming H_|x(·) ≡ H|y(·) to be Gaussian, which can be more restrictive in our joint

models than in the setup of Heffernan and Tawn in which (6.3) wrongly specifies the joint density. The Bayesian setup described in Chapter 7 could be used to remove the Gaussian assumption in our approaches. We use the penultimate approximations forα and β developed in Chapter 4 in the Gaussian case, as convergence of (6.1) is particularly slow in this case. The fits producing Table 6.1 had no constraints implemented, i.e., (α,β) ∈ [−1,1] × [0,1).

As expected, the model performs poorly on the data simulated from the inverted logistic and the logistic distributions, as assuming the residual distributions H_|x(·) ≡ H|y(·) to be Gaus-

sian is a misspecification which causes badly-biased estimates ofα and β. In their inference procedure, Heffernan and Tawn (2004) also assume normality of the residual distributions to construct a likelihood function, but then compute the empirical residuals of the form b

z = (y −αx)/xb

β_{, thus}

H_|x(·) andHb_|y(·) are not Gaussian.

For data simulated with Algorithm 6.2, the split approach has less bias than the average approach, as the rejection sampling of the algorithm mimics the structure of the split likelihood. In the Gaussian case, the average approach appears to perform similarly to the split approach, but for data with more correlation, the split approach performs better, i.e., it

110 Chapter 6. New improvements to the conditional tail model X Y vx= vy vy X Y vx= u u vy

Figure 6.4 – Diagram of the method of proportions for computing joint tail probabilities of the type Pr(X > vx,Y > vy). Left panel: vx= vy> u; right panel: vx= u ¿ vy. The dotted areas correspond to the probability to be estimated, and the models M_|xand M_|yare used to sample points in the regions shaded in blue and red respectively.

better captures the behaviour of jointly extreme events, corresponding to data points in R11.

Assessing the performance of our new methods based onα and β solely is insufficient, due to their inter-dependence, so it is more appropriate to consider tail probabilities.

We assess the performance of our approaches on two types of tail probabilities Pr(X > vx,Y > vy) = p, with fixed p and either vx= vy or vx = u ¿ vy. Here, we explain how we

compute these joint probabilities using the approach of Heffernan and Tawn (2004), and we give details about how to tackle this problem using our approaches in Section 6.6.3. The procedure of Heffernan and Tawn considers the joint probability Pr(X > vx,Y > vy) as the

sum Pr¡X > Y ,Y > vy¯¯X > vx ¢ Pr(X > vx) + Pr¡Y > X , X > vx¯¯Y > vy ¢ Pr¡Y > vy¢, (6.34)

where the conditional probabilities are estimated by computing the empirical estimates based on simulated data, and the marginal probabilities are Laplace. For example, the first probability in (6.34) is estimated by first independently sampling R replicates of X | X > vx from an exponential distribution and R replicates of Z from the empirical distribution

H_|x(·), then computing the corresponding Y replicates using the relation (6.2), in order to the proportion of the R points (X ,Y ), X > vx, in the region {(x, y) ∈ R2: x > y, y > vy}. The

proportions used in this procedure are illustrated in Figure 6.4, where the number of replicates of (X ,Y ) in the dotted regions are divided by R, the number of replicates in the shaded regions.

For data sampled with Algorithm 6.2, the relative efficiency for estimating Pr(X > vx,Y >

vy) is 81% when vx= vy and 85% when vx= u ¿ vy for the split approach, and 107% and

162% respectively for the average approach. The better performance of the split approach is due to its model matching exactly the simulation process. For data sampled from the other three processes, our methods show a relatively poor performance in estimating Pr(X > vx,Y > vy) compared to the method of Heffernan and Tawn (2004), but this is expected as, for

6.6 Simulation study 111 it is an empirical estimate, thus being more flexible. If we use a Gaussian distribution for extrapolation from the original method of Heffernan and Tawn, relative efficiency is below 50% in all cases except for the Gaussian distribution when vx¿ vyfor which it is below 85%,

and the split approach beats the average approach in all cases except for the logistic when vx= vy. As emphasised in Section 6.1, the likelihood used by Heffernan and Tawn is incorrect

and their approach does the marginal and joint fit in several steps. In the next section, we perform a simulation study where we use the average and split likelihoods to simultaneously fit the marginal and dependence features of processes with different asymptotic behaviours.

In document Semiparametric Bayesian Risk Estimation for Complex Extremes (Page 128-131)