Compositional data - Telematics insurance data

4.3 Telematics insurance data

4.4.2 Compositional data

The divisions of the total driven distance in the different categories – road types (4), time slots (5) and week/weekend (2), see Table 4.1 – are highly correlated with and sum up to the total driven distance. Incorporating these divisions in

a predictor also containing the total distance leads to a perfect multicollinearity problem. Furthermore, the corresponding model parameter estimators are not in- variant to the ordering of the components: the statistical inference changes when permuting the components making interpretations misleading. The standard regression interpretation of a change in one of the components of the distance when the other components are held constant is not possible due to the sum constraint of adding up to the total distance.

The total distance in meters is used as a continuous predictor in the telematics models and its effect is modeled using a smooth function. Since the divisions of the distance only contribute additional relative information, we divide all components of each split by the total driven distance, see Figure 4.4. We obtain what is known as compositional data (Van den Boogaart and Tolosana-Delgado, 2013; Pawlowsky-Glahn et al., 2015). Such data are represented by real vectors with constant sum equal to one and positive components. The space of representations of compositions is called the simplex ofD parts, denotedSD_{, defined by}

SD= ( x= (x1, . . . , xD)t:xi>0, D X i=1 xi= 1 ) .

Only relative information is important, and multiplication of the vector of positive components by a positive constant does not change the ratios between the components. When data are considered compositional, classical statistics, that do not take the special geometry of the simplex into account, are not appropri- ate. Extending the current literature, we propose a new way of quantifying and interpreting the effect of the compositional explanatory variables on the outcome and propose an approach to deal with structural zeros.

The Aitchison geometry of the simplex

The vector space structure of the mathematical simplex was discovered by Aitchi- son (1986) who defined operations on compositional data leading to the Aitchison geometry of the simplex. Perturbation plays the role of addition on the simplex and is defined as a closed component-wise productx⊕y=C(x1y1, . . . , xDyD)t,

where the closing operationC ensures a total sum of one, i.e. the closure of xis

C(x) =x/PD

i=1xi. The product of a vector by a scalar is called powering and is

defined asαx=C(xα

1, . . . , xαD)

t_{, for} _α_∈

4.4. Model building and selection 93 compositions is hx,yia = 1 2D D X i=1 D X j=1 lnxi xj lnyi yj = D X i=1 ln(xi) ln(yi)−1 D D X i=1 ln(xi) !  D X j=1 ln(yj)  

and induces the following normkxka = p

hx,xia and distance da(x,y) =kx

yka, where represents the opposite operation of⊕, i.e. y=⊕((−1)y). The simplex along with these operations then forms a (D−1)-dimensional Euclidean vector space (SD_,_⊕_,_,_h·_,_·i_{a). Given this Euclidean structure, we can measure} distances and angles, and define related geometrical concepts. Elementary statistical notions involving the metrics of the sample space can be adapted to the Euclidean structure of the simplex.

Egozcue et al. (2003) constructed orthonormal bases for this Euclidean space and deduced corresponding isometries between SD _and

RD−1, called isometric logratio transformations (ilr). One possible ilr transformation maps a compositional data vectorxin a (D−1)-dimensional real vectorz = (z1, z2, . . . , zD−1)t

with components zi= ilri(x) = r D−i D−i+ 1ln xi D−qi QD j=i+1xj , i= 1, . . . , D−1. (4.3)

As the ilr transformation is isometric, all angles and distances are preserved. This means that, whenever compositions are transformed into coordinates, the metrics and operations in the Aitchison geometry of the simplex are translated into the ordinary Euclidean metrics and operations in real space. LetV be theD×(D−1) matrix with elements

Vij =_p D−j

(D−j+ 1)(D−j) fori=j,

−1

(D−j+ 1)(D−j) fori > j, and 0 otherwise, for which it holds thatVt_V ₌_ID

−1andV Vt=ID−(1/D)1D1tD, where ID is the identity matrix of dimension D and 1D a D-vector of ones (Egozcue et al., 2011). Then we can rewrite this ilr transform and its inverse in matrix notation as

z= ilr(x) =Vtlnx, and x= ilr−1(z) =C(exp(Vz)), (4.4) where the logarithmic and exponential function apply componentwise.

Even though the simplexSD_{is a subset of the real space}

RD, Aitchison (1986) showed that the geometry is clearly different. Ignoring this aspect in a statistical context can lead to incompatible or incoherent results. The compositional na- ture of the data must not be ignored. The principle of working on coordinates in statistics (Mateu-Figueras et al., 2011) is to first express the compositional data with respect to an orthonormal basis of the underlying vector space with Euclidean structure. Next, to apply standard statistical techniques to the vectors of coordinates and, finally, to back-transform and describe the results in terms of the simplex. Final results do not depend on the chosen basis.

A new interpretation for compositional predictors

In our setting, it is key to incorporate the compositional data arising from the divisions of the distances into different categories as predictors in the claim count regression models. Hron et al. (2012) propose to first apply the isometric log ratio transform (4.3) to map the compositions in theD-part Aitchison simplex to a (D−1) Euclidean space. Then, these terms are used as explanatory variables in a linear regression model. More generally, in any regression context involving a predictor, one can add a compositional predictor termηcomp using the ilr transformed variables, i.e.

ηcomp=β1z1+. . .+βD−1zD−1. (4.5)

The fitted model does not depend on the choice of the orthonormal ilr basis since the coordinates ofx with respect to different orthonormal bases are orthogonal transformations of each other. Using the ilr transformation the model parameters can be estimated without constraints and the ceteris paribus interpretation of altering oneziwithout altering any other becomes possible. Only the first regression parameter,β1, however has a comprehensible interpretation sincez1explains

relevant information aboutx1. The remaining coefficients are not straightforward

to interpret and hence Hron et al. (2012) suggest to permute the indices in formula (4.3) and construct D regression models, each time with a different component first for which we can interpret the corresponding coefficient. Having to refit the model multiple times is undesirable, especially in our case where we have more than one compositional predictor and each model fit is computationally intensive due to smooth continuous, spatial, and random effects. Hence, we develop a new strategy to include compositional predictors and interpret their effect.

4.4. Model building and selection 95

whereβ= (β1, . . . , βD−1)t, we can rewrite the compositional predictor as

ηcomp= D−1 X i=1 βizi= D−1 X i=1 ilri(b)ilri(x) =hb,xia,

since the ilr transform preserves the inner product (Van den Boogaart and Tolosana- Delgado, 2013; Pawlowsky-Glahn et al., 2015). The compositionb∈ SD _{can be} interpreted as the simplicial gradient of ηcomp _{with respect to} _x _(Barcel´_o-Vidal

et al., 2011) and is the compositional direction along which the predictor increases fastest. In particular, if we increasextox˜=x⊕ b

kbka, then the predictor becomes

˜ ηcomp=hb,x˜ia=hb,x⊕ b kbka ia=hb,xia+ 1 kbka hb,bia=ηcomp+kbka.

WhenD = 3, the estimated regression model can be visualized as a surface on a ternary diagram (Van den Boogaart and Tolosana-Delgado, 2013). ForD >3, a graphical representation is not straightforward.

In order to overcome this shortcoming in interpretation and to develop a graphical representation for compositional explanatory variables, we propose to perturb the composition in the direction of each component. This offers a new interpretation for the effect of altering the composition on the predictor. For example, a relative ratio change of α > 1 (increase) orα < 1 (decrease) in the first component of xwith constant ratios of the remaining components can be achieved by perturbing the compositionxby (α,1, . . . ,1)t. This leads to a change of the predictor given by hb,(α,1, . . . ,1)tia = ln(b1) ln(α)− 1 D D X i=1 ln(bi) ! ln(α) = clr1(b) ln(α), (4.6)

which is independent of the original compositionxand where

clri(b) = ln _b i gm(b) , gm(b) = D Y i=1 bi !1/D , i= 1, . . . , D denotes the centered log-ratio (clr) transform of b(Egozcue et al., 2011). The effect of a relative increase in any of the components can hence best be un- derstood by considering the clr transform of b, of which the elements sum to zero and indicate the positive or negative effect of each component on the predictor. A graphical representation of the effect of a compositional predictor

can be made by visualizing clr(b) and comparing the elements to zero. Since

β= ilr(b) =Vt_ln(_b_{) =}_Vt_clr(_b_{) and}_{V V}t₌_I

D−(1/D)1D1tD, the clr transform ofbcan be written as clr(b) =Vβ. Confidence bounds can thus be constructed using the corresponding covariance matrix VΣVb t where Σ is the estimated co-b variance matrix related to estimatingβ. To interpret the effect on the level of the expected outcome in the Poisson and NB models, we can transform these confidence intervals using the exponential function. The exponentiated clr transform of bhas to be compared to one and the effect of a relative ratio change ofα in componenti= 1, . . . , Dis given byαclri(b)_.

Dealing with structural zeros in compositional predictors

An additional difficulty when incorporating the compositional information as predictors in the analysis of the claim counts is the presence of proportions of a specific component that are exactly zero. In the division of the driven distance by road type, for instance, many insureds did not drive abroad during the observed policy period. Since compositional data are always analyzed by considering lo- gratios of the components (see Section 4.4.2), a workaround is necessary.

In the compositional data literature, different types of zeros are being dis- tinguished (Pawlowsky-Glahn et al., 2015). Rounded zeros occur when certain components may be unobserved because their true values are below the detec- tion limit (cfr. geochemical studies). Count zeros refer to zero values due to the limited size of the sample in compositional data arising from count data. In our setting, the zero values are truly zero and are not due to imprecise or insufficient measurements. Such kind of zeros are calledstructural zeros. The structural zeros patterns in the data set are listed in Appendix 4.7. The presence of zeros is most prominent for splitting distance by road types as 40% of the drivers did not go abroad. Zeros are most often dealt with using replacement strategies (see e.g. Mart´ın-Fern´andez et al., 2011, for an overview), which do not make sense for structural zeros. A general methodology is still to be developed (see e.g. Aitchison and Kay, 2003; Bacon Shone, 2003). In particular, there does not exist a method that deals with compositional data with structural zeros as predictor in regression models. Applying the ilr transform to the compositional data x and using the transformedz as explanatory variables in the predictor as discussed in Section 4.4.2 is no longer possible.

We propose to treat the structural zero patterns of the compositional predictors as different subgroups within the data and model the effect conditional on the zero pattern. In the most general situation, 2D₋_{1 possible zero patterns can}

4.4. Model building and selection 97

occur when dealing with compositional data with D components (a structural zero for every component being excluded). We introduce indicator variables for each zero pattern and use these in the compositional predictor termηcomp_{of the}

regression model to specify the effect on the outcome separately for each zero pattern. More specifically, we define the variables

d(i1,...,ik)=

  

1 if componentsi1, . . . , ik ofxare nonzero and all other are zero,

0 otherwise

for allk= 1, . . . , Dand 1₆i1< . . . < ik 6D. Conditional on the zero pattern

(i1, . . . , ik) of the compositional data vector x, the contribution to the predictor

is given by the Aitchison inner producthb(i1,...,ik),x(i1,...,ik)ia of the subcompo-

sitionx(i1,...,ik)existing of the nonzero components ofxand a subcompositional

simplicial gradientb(i1,...,ik), which is different for each zero pattern. In case of

only one nonzero component, the contribution is given by a simple categorical ef- fectb(i). Note that the subscript (i1, . . . , ik) has a different interpretation for the

dummy variable, simplicial gradient and compositional data vector. The proposed compositional predictor reads

ηcomp= D X i=1 d(i)b(i)+ D X k=2 X 16i1<...<ik6D

d(i1,...,ik)hb(i1,...,ik),x(i1,...,ik)ia.

Zero pattern specific intercepts can be added in the second term if deemed necessary.

In document Data analytics for insurance loss modelling, telematics pricing and claims reserving. (Page 103-109)