2.3 Extreme Value Theory
2.3.5 Extreme-value Mixture Models
The previous sections focused solely on the modelling of the tails of a distribution by considering either block maxima or threshold exceedances. In many applications, however, both the body of a distribution and its tails are of interest. Extreme-value mixture models embed the extreme value methodology and consider the simultaneous estimation of the body and the tails. These
models are generally considered for a continuous random variableX with unknown distribution
function as in Sections 2.3.2 and 2.3.3.
In principle, the defined model consists of three components which each consider a different
domain of F, that is, lower tail, body and upper tail. By defining these components via their
density functionsfL,fB and fR, the resulting density functionf can be formally expressed as
f(x) =P(X < uL)fL(x) +P(uL≤X≤uR)fB(x) +P(X > uR)fR(x). (2.3.11)
Here the support offL,fB and fRlies within (−∞, uL),[uL, uR] and (uR,∞), respectively, and
FL,FB andFRare the associated distribution functions. FurtheruLanduRdenote respectively
the lower and upper thresholds. The unknown distribution function FL and FR are commonly
In what follows, several approaches to estimate such models are described, some of these impose
a continuity constraint forf at the thresholds uL and uR. For notational simplicity, the model
consists of two components only, the bodyfB and the upper tailfR.
Behrens et al. (2004) introduce a Bayesian modelling framework which treats the threshold as unknown and estimates it together with the remaining parameters. Specifically, the approach
considers a parametric form of FB, e.g. a Normal or Weibull distribution, and a GPD for FR.
Hence, the distribution functionF(x) is formally given as
F(x) = FB(x) x≤u FB(u) + [1−FB(u)]FR(x) x > u. (2.3.12)
This model specification generally results in a discontinuity off atu. Behrens et al. (2004) state
that uncertainty inuincreases with a decreasing size of the discontinuity. De Melo Mendes and
Lopes (2004) consider a similar model specification with FB being a Normal distribution and
different models for lower and upper tail. Instead of incorporating uncertainty in the thresholds, they optimize the proportion of data used to model the lower and upper tail. Carreau and Bengio (2009) propose a hybrid Pareto model which splices a Normal density with the density
of a GPD while preserving smoothness off(x) at the threshold.
While the former parametric approaches consider a strict partition as in (2.3.11), Frigessi et al. (2002) propose a dynamically weighted mixture model of a Weibull distribution and a GPD. In particular, the specified model does not require the estimation of a threshold. The density function is then formally given by
f(x) = [1−p(x)]fB(x) +p(x)fR(x)
C ,
where the mixing functionp(x) satisfiesp(x)→1 asx→xF, andfB and fRdenote the density
functions of a Weibull distribution and a GPD respectively. Further,Cis a normalising constant
which depends on the parameters. This model formulation implies that the GPD dominates the tail behaviour.
Additional to the parametric approaches, the non-parametric estimation offB is considered
in the literature. Tancredi et al. (2006) define fB as piece-wise constant with an unknown
number of steps and u is taken to be unknown. Estimates are then obtained via in reversible
Modelling Insurance Claims by
Spatially Varying Regression
3.1
Introduction
Eshita (1977) models the claim frequencies via a Binomial or Poisson model and, as outlined in Section 2.1, such approaches are often also applied in disease mapping. Scheel et al. (2013) argue, however, that these distributions are unsuitable for the insurance and weather data explored in Section 1.3 as these are incapable of capturing the high frequency of zero claims. They hence propose a Bayesian Poisson hurdle (BPH) model; see Section 1.4 for details. In this chapter, a comparative study is performed in order to assess the degree of improvement obtained by the Poisson hurdle model approach, as compared to one based on the Binomial distribution. Here,
this comparison is based upon allK = 430 municipalities.
Scheel et al. (2013) estimate the baseline risk and covariate effects in the BPH model in- dividually for each municipality and spatial information is used for variable selection (Section 1.4). The modelling framework introduced in this chapter also estimates the covariate effects municipality-wise but defines a dependence structure which, a priori, assumes covariate effects of adjacent municipalities to be more similar. In particular, the approach is based on the geograph-
ically varying coefficient (GVC) model (Assun¸c˜ao, 2003; Congdon, 2003) described in Section
2.1. Hence, estimates are obtained in a Bayesian framework rather than via weighted least squares methodology as in the geographically weighted regression approach (Brunsdon et al., 1998; Fotheringham et al., 2002).
A spatially varying modelling approach is suitable to examine the dependence between the
daily number of claimsNk,t and the weather data. Firstly, Section 1.3 shows a spatially varying
vulnerability to the covariates, e.g. the same amount of precipitation affects Oslo and Bergen differently. Secondly, adjacent municipalities exhibit strong similarities in the weather covariates and, hence, have a, presumably, similar vulnerability. The GVC modelling framework allows the explicit specification of a dependence model which shares statistical information between adjacent municipalities.
In addition to this model comparison, the potential adjustments to the covariates discussed in Section 1.3 are investigated. The exploratory data analysis indicates that the effect of rainfall and snowfall are different, in particular, the correlation of snowfall and claim numbers is close
to zero. Conversely,Nk,t andRk,t are positively correlated, conditional on a positive daily mean
temperature, Ck,t >0◦ Celsius. In other words, the analysis implies that rain affects the claim
dynamics on the day stronger than snow. Hence, the original covariatesRk,t and Rk,t−1 may be
replaced byRek,t andRek,t−1 which are defined as
e
Rk,t=Rk,t 1{C
k,t>0}. (3.1.1)
Consequently, the amount of snow fall on the day has no effect on the claim dynamics on the same day.
Section 1.3 also discusses the difference in snow-water equivalent, ∆Sk,t. The exploratory
data analysis indicates thatNk,t and ∆Sk,t are positively correlated, conditional on ∆Sk,t >0,
while little or no dependence is found otherwise. Similarly toRk,t, the covariate ∆Sk,t may be
replaced byg∆Sk,t which is defined by
g
∆Sk,t = ∆Sk,t 1{∆Sk,t>0}. (3.1.2)
In conclusion, ∆gSk,t corresponds to the amount of snow-melt rather than the difference in
snow-water equivalent. Additionally to these potential refinements in the covariates, Section 1.3 indicates that cities exhibit higher vulnerability than rural areas. This potential difference is
included via an additional binary factorZk which takes valueZk= 1 if the average number of
policies over the 10-year period exceeds 2,000 and Zk = 0 otherwise. To derive the threshold
of 2,000, the average claim rate per policy holder for municipalities below and above a range of potential thresholds was derived first. The examination of the ratio of the average claim rate then indicated that a threshold at around 2,000 yields the highest difference in the average claim
rates.
In summary, interest lies in the comparison of the Binomial and Poisson hurdle approaches,
as well as, the examination of the performance of the proposed covariates Rek,t and g∆Sk,t in
expressions (3.1.1) and (3.1.2), respectively. The drainage run-off Dk,t and the snow-water
equivalentSk,tare included additionally to the three covariates discussed above. In the following,
three competing models are examined: (i) Binomial distribution and original covariates (ii) Binomial distribution and proposed covariates and (iii) Poisson hurdle model and proposed covariates. A fourth possible setting is a Poisson-Hurdle model with the original covariates as done by Scheel et al. (2013). Instead of fitting this model, we compare the three settings to the original model by Scheel et al. (2013) which imposes less spatial structure on the covariate effects.
The remainder of this chapter is organized as follows: Section 3.2 details the modelling framework for both the Binomial and Poisson hurdle distribution and describes the estimation of the model parameters using Markov chain Monte Carlo (MCMC) techniques. Results for the three models are then compared and discussed in Section 3.3. Equivalently to Scheel et al. (2013), the predictive performance is assessed on a weekly basis. The chapter concludes with a discussion in Section 3.4.