Decorrelation offers an intuitive recipe for selecting variables under correla- tion. If there is no correlation among predictor variables there is consensus that the t-score in case of binary traits (Fan and Fan, 2008), respectively the marginal correlation in case of quantitative traits (Fan and Lv, 2008) are the optimal criteria to select variables. In the previous sections we have presented generalizations of the t-score and correlation to accommodate cor- relation among predictors, the CAT and the CAR score. To repeat, the CAT score in classification is the analog to the CAR score in regression. Hence, both scores share important characteristics and exhibit related behavior, as summarized in Table 4.7.
In particular, CAT and CAR score are defined as the Mahalanobis- decorrelated marginal quantities optimal for variable selection in case of no correlation, either the t-score τ or the marginal correlations PXY. Moreover,
while the CAT score decomposes Hotelling’s T2, the CAR score decom- poses the squared multiple correlation coefficient or proportion of variance explained. This suggests that Hotelling’s T2 in classification is the corre- sponding quantity to the squared multiple correlation coefficient. Already Hotelling (1931) mentioned the “affinity” of Hotellings T2with the multiple
correlation coefficient due to similar geometrical interpretations. Here, the connection of CAT and CAR scores provides more evidence that Hotellings T2and the multiple correlation coefficient are related quantities with respect to different scales of the outcome.
Table 4.7: Comparison of CAT and CAR scores.
CAT CAR
Response Y Binary Metric
Definition τadj=P−1/2τ ω=P−1/2PXY Marginal quantity τ= (n1 1 + 1 n2) −1/2V−1/2( µ1−µ2) PXY
Decomposition Squared multiple
Hotelling’s T2 correlation T2=∑di=1(τiadj)2 Ω2= ∑dj=1ω2j
Global test statistic
for a set of size s Ts2=∑si=1(tadji )2 R2s =∑sj=1ωˆ2j Null distribution for
empirical statistic n(n−−s2−)1sTs2∼ F(s, n−s−1) R2s ∼Beta(s2,n−2s−1)
under normality with n=n1+n2
4.7
Summary
Correlation-adjusted marginal correlations ω, or CAR scores, are our con- tribution to the discussion on quantifying variable importance. This ap- proach is based on simultaneous orthogonalization of the covariables by Mahalanobis-decorrelation and subsequently estimating the remaining cor- relation between the response and the sphered predictors. The CAR score meets most important properties postulated for measures of variable impor- tance, especially it decomposes the proportion of variance explained. Fur- thermore, in contrast to other quantities, it is applicable to high-dimensional data.
Beyond the notion of variable importance we argue that the CAR score is the central quantity to understand nominal prediction error and the variance decomposition. In particular, the CAR score offers an elegant reformulation of the decomposition of variances
Total variance z }| { Var(Y) = Explained variance z }| { Var(Y?) + Unexplained variance z }| { Var(Y−Y?) σY2 = σY2(ωTω) + σY2(1−ωTω).
Thus, we argue that the CAR score is the central quantity to assess which variables contribute to the explained variance or equivalently reduce the unexplained variance.
In an extensive simulation study, we demonstrate that the CAR score exhibits superior performance not only in ranking but also in prediction. It outperforms competing approaches in terms of true positives in ranking and in terms of model error in prediction. Interestingly, elastic net, lasso, and boosting fail to recover negative regression coefficients in contrast to the linear model, partial correlation, and the CAR score.
Computational issues
The estimation and handling of correlation matrices in high-dimensional data is a complicated task. Here, we first discuss the Mahalanobis transform that performs decorrelation in an unique way. Furthermore, we present an efficient algorithm to compute the matrix power of high-dimensional matrices and thus allows the computation of CAT and CAR scores even in high-dimensions. Next, we illustrate a subtle trick using simple analysis that allows an enormous reduction in storage and computation time. Ad- ditionally, we provide information on how to determine the model size for CAT and CAR scores.
5.1
Special properties of the
Mahalanobis transform
The computation of CAT and CAR scores relies on decorrelation by the Mahalanobis transform which derives from the Mahalanobis distance. The Mahalanobis distance is a metric alternative to the Euclidean that is used for non-spherical distributions. It quantifies the distance of a point x to the expectation vector µ of a multivariate distribution as
D2(x) = (x−µ)TΣ−1(x−µ)
where Σ−1 is the covariance of the distribution. Then, the Mahalanobis transform δ(x)by the correlation matrix P is defined as
δ(x) = P−1/2x
=U M−1/2UTx (5.1)
where M is a diagonal matrix containing the eigenvalues and U is the orthogonal eigenvector system, derived from an eigendecomposition of the correlation matrix P.
Importantly, the Mahalanobis transform has a number of properties not shared by other decorrelation transforms with Var(δ(x)) =diag(σ2). First,
it is the unique linear transformation that minimizes E (δ(x) −x)T(δ(x) −x),
see Genizi (1993) and Hyvärinen et al. (2001, Section 6.5). Therefore, the Mahalanobis-decorrelated data δ(x)are nearest to the original data x. Sec- ond, as P−1/2is positive definite δ(x)Tx> 0 for any x which implies that
δ(x)T and x are informative about each other also on a componentwise level
(for example they must have the same sign). The correlation of the corre- sponding elements in x and δ(x)is given by Cor((x)i, δ(x)i) = (P1/2)ii.