Extreme-value Mixture Models - Extreme Value Theory

2.3 Extreme Value Theory

2.3.5 Extreme-value Mixture Models

The previous sections focused solely on the modelling of the tails of a distribution by considering either block maxima or threshold exceedances. In many applications, however, both the body of a distribution and its tails are of interest. Extreme-value mixture models embed the extreme value methodology and consider the simultaneous estimation of the body and the tails. These

models are generally considered for a continuous random variableX with unknown distribution

function as in Sections 2.3.2 and 2.3.3.

In principle, the defined model consists of three components which each consider a different

domain of F, that is, lower tail, body and upper tail. By defining these components via their

density functionsfL,fB and fR, the resulting density functionf can be formally expressed as

f(x) =P(X < uL)fL(x) +P(uL≤X≤uR)fB(x) +P(X > uR)fR(x). (2.3.11)

Here the support offL,fB and fRlies within (−∞, uL),[uL, uR] and (uR,∞), respectively, and

FL,FB andFRare the associated distribution functions. FurtheruLanduRdenote respectively

the lower and upper thresholds. The unknown distribution function FL and FR are commonly

In what follows, several approaches to estimate such models are described, some of these impose

a continuity constraint forf at the thresholds uL and uR. For notational simplicity, the model

consists of two components only, the bodyfB and the upper tailfR.

Behrens et al. (2004) introduce a Bayesian modelling framework which treats the threshold as unknown and estimates it together with the remaining parameters. Specifically, the approach

considers a parametric form of FB, e.g. a Normal or Weibull distribution, and a GPD for FR.

Hence, the distribution functionF(x) is formally given as

F(x) =        FB(x) x≤u FB(u) + [1−FB(u)]FR(x) x > u. (2.3.12)

This model specification generally results in a discontinuity off atu. Behrens et al. (2004) state

that uncertainty inuincreases with a decreasing size of the discontinuity. De Melo Mendes and

Lopes (2004) consider a similar model specification with FB being a Normal distribution and

different models for lower and upper tail. Instead of incorporating uncertainty in the thresholds, they optimize the proportion of data used to model the lower and upper tail. Carreau and Bengio (2009) propose a hybrid Pareto model which splices a Normal density with the density

of a GPD while preserving smoothness off(x) at the threshold.

While the former parametric approaches consider a strict partition as in (2.3.11), Frigessi et al. (2002) propose a dynamically weighted mixture model of a Weibull distribution and a GPD. In particular, the specified model does not require the estimation of a threshold. The density function is then formally given by

f(x) = [1−p(x)]fB(x) +p(x)fR(x)

C ,

where the mixing functionp(x) satisfiesp(x)→1 asx→xF, andfB and fRdenote the density

functions of a Weibull distribution and a GPD respectively. Further,Cis a normalising constant

which depends on the parameters. This model formulation implies that the GPD dominates the tail behaviour.

Additional to the parametric approaches, the non-parametric estimation offB is considered

in the literature. Tancredi et al. (2006) define fB as piece-wise constant with an unknown

number of steps and u is taken to be unknown. Estimates are then obtained via in reversible

Modelling Insurance Claims by

Spatially Varying Regression

3.1 Introduction

Eshita (1977) models the claim frequencies via a Binomial or Poisson model and, as outlined in Section 2.1, such approaches are often also applied in disease mapping. Scheel et al. (2013) argue, however, that these distributions are unsuitable for the insurance and weather data explored in Section 1.3 as these are incapable of capturing the high frequency of zero claims. They hence propose a Bayesian Poisson hurdle (BPH) model; see Section 1.4 for details. In this chapter, a comparative study is performed in order to assess the degree of improvement obtained by the Poisson hurdle model approach, as compared to one based on the Binomial distribution. Here,

this comparison is based upon allK = 430 municipalities.

Scheel et al. (2013) estimate the baseline risk and covariate effects in the BPH model in- dividually for each municipality and spatial information is used for variable selection (Section 1.4). The modelling framework introduced in this chapter also estimates the covariate effects municipality-wise but defines a dependence structure which, a priori, assumes covariate effects of adjacent municipalities to be more similar. In particular, the approach is based on the geograph-

ically varying coefficient (GVC) model (Assun¸c˜ao, 2003; Congdon, 2003) described in Section

2.1. Hence, estimates are obtained in a Bayesian framework rather than via weighted least squares methodology as in the geographically weighted regression approach (Brunsdon et al., 1998; Fotheringham et al., 2002).

A spatially varying modelling approach is suitable to examine the dependence between the

daily number of claimsNk,t and the weather data. Firstly, Section 1.3 shows a spatially varying

vulnerability to the covariates, e.g. the same amount of precipitation affects Oslo and Bergen differently. Secondly, adjacent municipalities exhibit strong similarities in the weather covariates and, hence, have a, presumably, similar vulnerability. The GVC modelling framework allows the explicit specification of a dependence model which shares statistical information between adjacent municipalities.

In addition to this model comparison, the potential adjustments to the covariates discussed in Section 1.3 are investigated. The exploratory data analysis indicates that the effect of rainfall and snowfall are different, in particular, the correlation of snowfall and claim numbers is close

to zero. Conversely,Nk,t andRk,t are positively correlated, conditional on a positive daily mean

temperature, Ck,t >0◦ Celsius. In other words, the analysis implies that rain affects the claim

dynamics on the day stronger than snow. Hence, the original covariatesRk,t and Rk,t−1 may be

replaced byRe_k,t andRe_k,t−1 which are defined as

Rk,t=Rk,t 1_{C

k,t>0}. (3.1.1)

Consequently, the amount of snow fall on the day has no effect on the claim dynamics on the same day.

Section 1.3 also discusses the difference in snow-water equivalent, ∆Sk,t. The exploratory

data analysis indicates thatNk,t and ∆Sk,t are positively correlated, conditional on ∆Sk,t >0,

while little or no dependence is found otherwise. Similarly toRk,t, the covariate ∆Sk,t may be

replaced byg∆S_k,t which is defined by

∆Sk,t = ∆Sk,t 1_{_∆_S_k,t_>₀_}. (3.1.2)

In conclusion, ∆gS_k,t corresponds to the amount of snow-melt rather than the difference in

snow-water equivalent. Additionally to these potential refinements in the covariates, Section 1.3 indicates that cities exhibit higher vulnerability than rural areas. This potential difference is

included via an additional binary factorZk which takes valueZk= 1 if the average number of

policies over the 10-year period exceeds 2,000 and Zk = 0 otherwise. To derive the threshold

of 2,000, the average claim rate per policy holder for municipalities below and above a range of potential thresholds was derived first. The examination of the ratio of the average claim rate then indicated that a threshold at around 2,000 yields the highest difference in the average claim

rates.

In summary, interest lies in the comparison of the Binomial and Poisson hurdle approaches,

as well as, the examination of the performance of the proposed covariates Re_k,t and g∆S_k,t in

expressions (3.1.1) and (3.1.2), respectively. The drainage run-off Dk,t and the snow-water

equivalentSk,tare included additionally to the three covariates discussed above. In the following,

three competing models are examined: (i) Binomial distribution and original covariates (ii) Binomial distribution and proposed covariates and (iii) Poisson hurdle model and proposed covariates. A fourth possible setting is a Poisson-Hurdle model with the original covariates as done by Scheel et al. (2013). Instead of fitting this model, we compare the three settings to the original model by Scheel et al. (2013) which imposes less spatial structure on the covariate effects.

The remainder of this chapter is organized as follows: Section 3.2 details the modelling framework for both the Binomial and Poisson hurdle distribution and describes the estimation of the model parameters using Markov chain Monte Carlo (MCMC) techniques. Results for the three models are then compared and discussed in Section 3.3. Equivalently to Scheel et al. (2013), the predictive performance is assessed on a weekly basis. The chapter concludes with a discussion in Section 3.4.

In document Statistical methods for weather related insurance claims (Page 62-67)