4.3 Telematics insurance data
4.4.2 Compositional data
The divisions of the total driven distance in the different categories – road types (4), time slots (5) and week/weekend (2), see Table 4.1 – are highly correlated with and sum up to the total driven distance. Incorporating these divisions in
a predictor also containing the total distance leads to a perfect multicollinearity problem. Furthermore, the corresponding model parameter estimators are not in- variant to the ordering of the components: the statistical inference changes when permuting the components making interpretations misleading. The standard re- gression interpretation of a change in one of the components of the distance when the other components are held constant is not possible due to the sum constraint of adding up to the total distance.
The total distance in meters is used as a continuous predictor in the telematics models and its effect is modeled using a smooth function. Since the divisions of the distance only contribute additional relative information, we divide all components of each split by the total driven distance, see Figure 4.4. We obtain what is known as compositional data (Van den Boogaart and Tolosana-Delgado, 2013; Pawlowsky-Glahn et al., 2015). Such data are represented by real vectors with constant sum equal to one and positive components. The space of representations of compositions is called the simplex ofD parts, denotedSD, defined by
SD= ( x= (x1, . . . , xD)t:xi>0, D X i=1 xi= 1 ) .
Only relative information is important, and multiplication of the vector of pos- itive components by a positive constant does not change the ratios between the components. When data are considered compositional, classical statistics, that do not take the special geometry of the simplex into account, are not appropri- ate. Extending the current literature, we propose a new way of quantifying and interpreting the effect of the compositional explanatory variables on the outcome and propose an approach to deal with structural zeros.
The Aitchison geometry of the simplex
The vector space structure of the mathematical simplex was discovered by Aitchi- son (1986) who defined operations on compositional data leading to the Aitchison geometry of the simplex. Perturbation plays the role of addition on the simplex and is defined as a closed component-wise productx⊕y=C(x1y1, . . . , xDyD)t,
where the closing operationC ensures a total sum of one, i.e. the closure of xis
C(x) =x/PD
i=1xi. The product of a vector by a scalar is called powering and is
defined asαx=C(xα
1, . . . , xαD)
t, for α∈
4.4. Model building and selection 93 compositions is hx,yia = 1 2D D X i=1 D X j=1 lnxi xj lnyi yj = D X i=1 ln(xi) ln(yi)−1 D D X i=1 ln(xi) ! D X j=1 ln(yj)
and induces the following normkxka = p
hx,xia and distance da(x,y) =kx
yka, where represents the opposite operation of⊕, i.e. y=⊕((−1)y). The simplex along with these operations then forms a (D−1)-dimensional Euclidean vector space (SD,⊕,,h·,·ia). Given this Euclidean structure, we can measure distances and angles, and define related geometrical concepts. Elementary sta- tistical notions involving the metrics of the sample space can be adapted to the Euclidean structure of the simplex.
Egozcue et al. (2003) constructed orthonormal bases for this Euclidean space and deduced corresponding isometries between SD and
RD−1, called isometric logratio transformations (ilr). One possible ilr transformation maps a composi- tional data vectorxin a (D−1)-dimensional real vectorz = (z1, z2, . . . , zD−1)t
with components zi= ilri(x) = r D−i D−i+ 1ln xi D−qi QD j=i+1xj , i= 1, . . . , D−1. (4.3)
As the ilr transformation is isometric, all angles and distances are preserved. This means that, whenever compositions are transformed into coordinates, the metrics and operations in the Aitchison geometry of the simplex are translated into the ordinary Euclidean metrics and operations in real space. LetV be theD×(D−1) matrix with elements
Vij =p D−j
(D−j+ 1)(D−j) fori=j,
−1
p
(D−j+ 1)(D−j) fori > j, and 0 otherwise, for which it holds thatVtV =ID
−1andV Vt=ID−(1/D)1D1tD, where ID is the identity matrix of dimension D and 1D a D-vector of ones (Egozcue et al., 2011). Then we can rewrite this ilr transform and its inverse in matrix notation as
z= ilr(x) =Vtlnx, and x= ilr−1(z) =C(exp(Vz)), (4.4) where the logarithmic and exponential function apply componentwise.
Even though the simplexSDis a subset of the real space
RD, Aitchison (1986) showed that the geometry is clearly different. Ignoring this aspect in a statistical context can lead to incompatible or incoherent results. The compositional na- ture of the data must not be ignored. The principle of working on coordinates in statistics (Mateu-Figueras et al., 2011) is to first express the compositional data with respect to an orthonormal basis of the underlying vector space with Euclidean structure. Next, to apply standard statistical techniques to the vectors of coordinates and, finally, to back-transform and describe the results in terms of the simplex. Final results do not depend on the chosen basis.
A new interpretation for compositional predictors
In our setting, it is key to incorporate the compositional data arising from the divisions of the distances into different categories as predictors in the claim count regression models. Hron et al. (2012) propose to first apply the isometric log ratio transform (4.3) to map the compositions in theD-part Aitchison simplex to a (D−1) Euclidean space. Then, these terms are used as explanatory variables in a linear regression model. More generally, in any regression context involv- ing a predictor, one can add a compositional predictor termηcomp using the ilr transformed variables, i.e.
ηcomp=β1z1+. . .+βD−1zD−1. (4.5)
The fitted model does not depend on the choice of the orthonormal ilr basis since the coordinates ofx with respect to different orthonormal bases are orthogonal transformations of each other. Using the ilr transformation the model parameters can be estimated without constraints and the ceteris paribus interpretation of altering oneziwithout altering any other becomes possible. Only the first regres- sion parameter,β1, however has a comprehensible interpretation sincez1explains
relevant information aboutx1. The remaining coefficients are not straightforward
to interpret and hence Hron et al. (2012) suggest to permute the indices in formula (4.3) and construct D regression models, each time with a different component first for which we can interpret the corresponding coefficient. Having to refit the model multiple times is undesirable, especially in our case where we have more than one compositional predictor and each model fit is computationally intensive due to smooth continuous, spatial, and random effects. Hence, we develop a new strategy to include compositional predictors and interpret their effect.
4.4. Model building and selection 95
whereβ= (β1, . . . , βD−1)t, we can rewrite the compositional predictor as
ηcomp= D−1 X i=1 βizi= D−1 X i=1 ilri(b)ilri(x) =hb,xia,
since the ilr transform preserves the inner product (Van den Boogaart and Tolosana- Delgado, 2013; Pawlowsky-Glahn et al., 2015). The compositionb∈ SD can be interpreted as the simplicial gradient of ηcomp with respect to x (Barcel´o-Vidal
et al., 2011) and is the compositional direction along which the predictor increases fastest. In particular, if we increasextox˜=x⊕ b
kbka, then the predictor becomes
˜ ηcomp=hb,x˜ia=hb,x⊕ b kbka ia=hb,xia+ 1 kbka hb,bia=ηcomp+kbka.
WhenD = 3, the estimated regression model can be visualized as a surface on a ternary diagram (Van den Boogaart and Tolosana-Delgado, 2013). ForD >3, a graphical representation is not straightforward.
In order to overcome this shortcoming in interpretation and to develop a graph- ical representation for compositional explanatory variables, we propose to perturb the composition in the direction of each component. This offers a new interpre- tation for the effect of altering the composition on the predictor. For example, a relative ratio change of α > 1 (increase) orα < 1 (decrease) in the first com- ponent of xwith constant ratios of the remaining components can be achieved by perturbing the compositionxby (α,1, . . . ,1)t. This leads to a change of the predictor given by hb,(α,1, . . . ,1)tia = ln(b1) ln(α)− 1 D D X i=1 ln(bi) ! ln(α) = clr1(b) ln(α), (4.6)
which is independent of the original compositionxand where
clri(b) = ln b i gm(b) , gm(b) = D Y i=1 bi !1/D , i= 1, . . . , D denotes the centered log-ratio (clr) transform of b(Egozcue et al., 2011). The effect of a relative increase in any of the components can hence best be un- derstood by considering the clr transform of b, of which the elements sum to zero and indicate the positive or negative effect of each component on the pre- dictor. A graphical representation of the effect of a compositional predictor
can be made by visualizing clr(b) and comparing the elements to zero. Since
β= ilr(b) =Vtln(b) =Vtclr(b) andV Vt=I
D−(1/D)1D1tD, the clr transform ofbcan be written as clr(b) =Vβ. Confidence bounds can thus be constructed using the corresponding covariance matrix VΣVb t where Σ is the estimated co-b variance matrix related to estimatingβ. To interpret the effect on the level of the expected outcome in the Poisson and NB models, we can transform these confi- dence intervals using the exponential function. The exponentiated clr transform of bhas to be compared to one and the effect of a relative ratio change ofα in componenti= 1, . . . , Dis given byαclri(b).
Dealing with structural zeros in compositional predictors
An additional difficulty when incorporating the compositional information as pre- dictors in the analysis of the claim counts is the presence of proportions of a spe- cific component that are exactly zero. In the division of the driven distance by road type, for instance, many insureds did not drive abroad during the observed policy period. Since compositional data are always analyzed by considering lo- gratios of the components (see Section 4.4.2), a workaround is necessary.
In the compositional data literature, different types of zeros are being dis- tinguished (Pawlowsky-Glahn et al., 2015). Rounded zeros occur when certain components may be unobserved because their true values are below the detec- tion limit (cfr. geochemical studies). Count zeros refer to zero values due to the limited size of the sample in compositional data arising from count data. In our setting, the zero values are truly zero and are not due to imprecise or insufficient measurements. Such kind of zeros are calledstructural zeros. The structural ze- ros patterns in the data set are listed in Appendix 4.7. The presence of zeros is most prominent for splitting distance by road types as 40% of the drivers did not go abroad. Zeros are most often dealt with using replacement strategies (see e.g. Mart´ın-Fern´andez et al., 2011, for an overview), which do not make sense for structural zeros. A general methodology is still to be developed (see e.g. Aitchison and Kay, 2003; Bacon Shone, 2003). In particular, there does not exist a method that deals with compositional data with structural zeros as predictor in regression models. Applying the ilr transform to the compositional data x and using the transformedz as explanatory variables in the predictor as discussed in Section 4.4.2 is no longer possible.
We propose to treat the structural zero patterns of the compositional predic- tors as different subgroups within the data and model the effect conditional on the zero pattern. In the most general situation, 2D−1 possible zero patterns can
4.4. Model building and selection 97
occur when dealing with compositional data with D components (a structural zero for every component being excluded). We introduce indicator variables for each zero pattern and use these in the compositional predictor termηcompof the
regression model to specify the effect on the outcome separately for each zero pattern. More specifically, we define the variables
d(i1,...,ik)=
1 if componentsi1, . . . , ik ofxare nonzero and all other are zero,
0 otherwise
for allk= 1, . . . , Dand 16i1< . . . < ik 6D. Conditional on the zero pattern
(i1, . . . , ik) of the compositional data vector x, the contribution to the predictor
is given by the Aitchison inner producthb(i1,...,ik),x(i1,...,ik)ia of the subcompo-
sitionx(i1,...,ik)existing of the nonzero components ofxand a subcompositional
simplicial gradientb(i1,...,ik), which is different for each zero pattern. In case of
only one nonzero component, the contribution is given by a simple categorical ef- fectb(i). Note that the subscript (i1, . . . , ik) has a different interpretation for the
dummy variable, simplicial gradient and compositional data vector. The proposed compositional predictor reads
ηcomp= D X i=1 d(i)b(i)+ D X k=2 X 16i1<...<ik6D
d(i1,...,ik)hb(i1,...,ik),x(i1,...,ik)ia.
Zero pattern specific intercepts can be added in the second term if deemed nec- essary.