Comparison of CAT and CAR score - A Multivariate Framework for Variable Selection and Identific

Decorrelation offers an intuitive recipe for selecting variables under correlation. If there is no correlation among predictor variables there is consensus that the t-score in case of binary traits (Fan and Fan, 2008), respectively the marginal correlation in case of quantitative traits (Fan and Lv, 2008) are the optimal criteria to select variables. In the previous sections we have presented generalizations of the t-score and correlation to accommodate correlation among predictors, the CAT and the CAR score. To repeat, the CAT score in classification is the analog to the CAR score in regression. Hence, both scores share important characteristics and exhibit related behavior, as summarized in Table 4.7.

In particular, CAT and CAR score are defined as the Mahalanobis- decorrelated marginal quantities optimal for variable selection in case of no correlation, either the t-score τ or the marginal correlations PXY. Moreover,

while the CAT score decomposes Hotelling’s T2, the CAR score decomposes the squared multiple correlation coefficient or proportion of variance explained. This suggests that Hotelling’s T2 in classification is the corre- sponding quantity to the squared multiple correlation coefficient. Already Hotelling (1931) mentioned the “affinity” of Hotellings T2with the multiple

correlation coefficient due to similar geometrical interpretations. Here, the connection of CAT and CAR scores provides more evidence that Hotellings T2and the multiple correlation coefficient are related quantities with respect to different scales of the outcome.

Table 4.7: Comparison of CAT and CAR scores.

CAT CAR

Response Y Binary Metric

Definition τadj=P−1/2τ ω=P−1/2PXY Marginal quantity τ= (_n1 1 + 1 n2) −1/2_V−1/2₍ µ₁−µ₂) PXY

Decomposition Squared multiple

Hotelling’s T2 correlation T2=∑d_i₌₁(τ_iadj)2 Ω2= ∑d_j₌₁ω2_j

Global test statistic

for a set of size s T_s2=_∑s_i₌₁(tadj_i )2 R2_s =_∑s_j₌₁ωˆ2_j Null distribution for

empirical statistic n₍_n−₋s₂−₎1_sT_s2∼ F(s, n−s−1) R2_s ∼Beta(s₂,n−₂s−1)

under normality with n=n1+n2

4.7 Summary

Correlation-adjusted marginal correlations ω, or CAR scores, are our con- tribution to the discussion on quantifying variable importance. This ap- proach is based on simultaneous orthogonalization of the covariables by Mahalanobis-decorrelation and subsequently estimating the remaining correlation between the response and the sphered predictors. The CAR score meets most important properties postulated for measures of variable importance, especially it decomposes the proportion of variance explained. Fur- thermore, in contrast to other quantities, it is applicable to high-dimensional data.

Beyond the notion of variable importance we argue that the CAR score is the central quantity to understand nominal prediction error and the variance decomposition. In particular, the CAR score offers an elegant reformulation of the decomposition of variances

Total variance z }| { Var(Y) = Explained variance z }| { Var(Y?) + Unexplained variance z }| { Var(Y−Y?) σ_Y2 = σ_Y2(ωTω) + σ_Y2(1−ωTω).

Thus, we argue that the CAR score is the central quantity to assess which variables contribute to the explained variance or equivalently reduce the unexplained variance.

In an extensive simulation study, we demonstrate that the CAR score exhibits superior performance not only in ranking but also in prediction. It outperforms competing approaches in terms of true positives in ranking and in terms of model error in prediction. Interestingly, elastic net, lasso, and boosting fail to recover negative regression coefficients in contrast to the linear model, partial correlation, and the CAR score.

Computational issues

The estimation and handling of correlation matrices in high-dimensional data is a complicated task. Here, we first discuss the Mahalanobis transform that performs decorrelation in an unique way. Furthermore, we present an efficient algorithm to compute the matrix power of high-dimensional matrices and thus allows the computation of CAT and CAR scores even in high-dimensions. Next, we illustrate a subtle trick using simple analysis that allows an enormous reduction in storage and computation time. Ad- ditionally, we provide information on how to determine the model size for CAT and CAR scores.

5.1 Special properties of the

Mahalanobis transform

The computation of CAT and CAR scores relies on decorrelation by the Mahalanobis transform which derives from the Mahalanobis distance. The Mahalanobis distance is a metric alternative to the Euclidean that is used for non-spherical distributions. It quantifies the distance of a point x to the expectation vector µ of a multivariate distribution as

D2(x) = (x−µ)TΣ−1(x−µ)

where Σ−1 is the covariance of the distribution. Then, the Mahalanobis transform δ(x)by the correlation matrix P is defined as

δ(x) = P−1/2x

=U M−1/2UTx (5.1)

where M is a diagonal matrix containing the eigenvalues and U is the orthogonal eigenvector system, derived from an eigendecomposition of the correlation matrix P.

Importantly, the Mahalanobis transform has a number of properties not shared by other decorrelation transforms with Var(δ(x)) =diag(σ2). First,

it is the unique linear transformation that minimizes E (δ(x) −x)T(δ(x) −x),

see Genizi (1993) and Hyvärinen et al. (2001, Section 6.5). Therefore, the Mahalanobis-decorrelated data δ(x)are nearest to the original data x. Sec- ond, as P−1/2is positive definite δ(x)Tx> 0 for any x which implies that

δ(x)T and x are informative about each other also on a componentwise level

(for example they must have the same sign). The correlation of the corre- sponding elements in x and δ(x)is given by Cor((x)_i, δ(x)_i) = (P1/2)_ii.

In document A Multivariate Framework for Variable Selection and Identification of Biomarkers in High-Dimensional Omics Data (Page 80-84)