Gene Expression Data - Numerical Results - Sparse Dimensionality Reduction Methods: Algorithms

4.3 Numerical Results

4.3.2 Gene Expression Data

In this subsection, we test our sparse CCA algorithm on four gene expression data sets. We compare the classification performance of non-sparse CCA and ULDA to verify the equivalent relationship presented in Section 3.3. We also compare our

sparse CCA algorithm with other three algorithms.

The four gene expression data sets we used are: Prostate, Lymphoma, SRBCT and Brain, whose structures can be found in Table4.2 and detailed descriptions are summarized in AppendixA. The data sets and their preprocessing are fully described in [44]. In this experiment, we choose l (the number of columns in Wxand Wy) equal

to m (the rank of matrix XYT_{). When l = m, Trace(W}T

xXYTWy) =

i=1ηi, which

is the summation of all non-zero canonical correlations between XTWx and YTWy.

According to Theorem 3.8_{, we know that l ≥ m is required to ensure that G ∈ R}d×l

is a solution of optimization problem in (2.3) (i.e., ULDA). On the other hand, when we compute G for dimensionality reduction, its dimension should be as small as possible. Hence we choose the transformation with the minimal dimension, i.e., l = m.

Table 4.2: Data stuctures: data dimension (d1), training size (n), the number of

classes (K) and the number of testing data (# Testing), m is the rank of matrix XYT_{, l is the number of columns in W}

x and Wy and we choose l = m in our

experiments.

Data set d1 n K # Testing m l

Prostate 6033 51 2 51 1 1

Lymphoma 4026 32 3 30 2 2

SRBCT 2308 32 4 31 3 3

Brain 5597 21 5 21 4 4

To distinguish non-sparse and sparse solutions of CCA, we use (W_xN S, W_yN S) to denote non-sparse CCA solution and (Wx, Wy) to denote sparse CCA solution, i.e.,

W_xN S = U1Σ−11 P1(:, 1 : l), WyN S = V1Σ−12 P2(:, 1 : l),

and (Wx, Wy) is the pair computed by SCCA `1, or Algorithm PMD, or Algorithm

CCA EN, or Algorithm SCCA PD. Table 4.3 lists the classification accuracy of ULDA and non-sparse CCA using 1NN as classifier. According to Table 4.3, the classification accuracy achieved by W_xN S of CCA is the same as that of ULDA, which

4.3 Numerical Results 89 is consistent with the fact, as described in Section 3.3, that ULDA is a special case

of CCA.

Table 4.3: Comparison of classification accuracy (%) between ULDA and W_xN S of CCA using 1NN as classifier

Data set Prostate Lymphoma SRBCT Brain

ULDA 92.1569 100 96.7742 76.1905

WN S

x of CCA 92.1569 100 96.7742 76.1905

Now, we study the performance of Algorithm2 in comparison with other three algorithms. Experimental results are listed in Table 4.4 and Table 4.5, where the following notations are used:

(i) c1 and c2 denote correlations Trace((WN S

x )TXYTWyN S) and Trace(WxTXYTWy)

for training data, respectively;

(ii) c3 and c4 denote correlations Trace((W_xN S)TXYTW_yN S) and Trace(W_xTXYTWy)

for testing data, respectively;

(iii) c5 and c6 denote sparsity of Wx (%) and Wy (%), respectively;

(iv) c7 = kWxTXX_√TWx−IlkF

l and c8 =

kWT

yY Y_√TWy−IlkF

l measuring orthogonality of

canonical variables XTWx and YTWy, respectively;

(v) c9: classification accuracy using sparse Wx (%).

Table 4.4 and Table4.5 lead to the following observations:

1. In Algorithm 2, for any fixed δ lying between (0, 1) (δ = 0.9 in our experiments), we only need to choose sufficient large µx and µy as stated in Theorem

2.5 to get a high sparsity for the computed solutions Wx and Wy; however, the

sparsity of Wx and Wy is not affected when (µx, µy) is increased from (5, 50)

to (10, 100) in our experiments. This demonstrates that (W_x,µ∗ _x, W_y,µ∗ _y) is al- ready a good approximation of (W_x∗, W_y∗) when (µx, µy) = (5, 50). However,

Chapter 4. Sparse Canonical Correlation Analysis

Table 4.4: Comparison of results obtained by SCCA `1, PMD, CCA EN, and SCCA PD.

SCCA `1 with PMD with CCA EN with

SCCA PD µx = µ, µy = 10µ candidates for c1, c2 candidates for λx, λy

µ = 5 µ = 10 0.5:0.01:1 0.01:0.01:0.5 0:0.01:0.5 0.5:0.01:1 Prostate c1 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 c2 1.0000 1.0000 0.5052 0.6521 0.4908 0.5316 0.9409 c3 0.8111 0.8111 0.8111 0.8111 0.8111 0.8111 0.81111 c4 0.8273 0.8273 0.5649 0.7618 0.5538 0.5870 0.8659 c5 99.17 99.17 22.48 99.98 0 56.46 99.65 c6 0 50 50 50 0 0 50

c7 1.7916e-7 1.6042e-7 8.8818e-16 4.4409e-16 0 0 0

c8 8.6222e-6 2.8527e-6 4.4409e-16 4.4409e-16 1.7764e-15 0 3.3307e-16 c9 94.1176 94.1176 80.3922 86.2745 66.6667 66.6667 94.1176 Lymphoma c1 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 c2 2.0000 2.0000 1.6754 1.6257 1.8849 1.8842 1.7217 c3 1.8529 1.8529 1.8529 1.8529 1.8529 1.8529 1.8529 c4 1.7946 1.7946 1.5907 1.5017 1.7861 1.7835 1.5427 c5 99.23 99.23 25.67 99.68 19.56 41.24 99.60 c6 50 50 50 66.67 0 0 50

c7 6.8731e-6 8.6251e-6 9.0242e-1 9.4602e-1 4.2377e-2 4.0695e-2 2.8697e-1 c8 3.9435e-6 1.4939e-6 8.9761e-2 5.9459e-1 1.8663e-3 4.1585e-3 4.8046e-1

4.3

Numerical

Results

Table 4.5: Comparison of results obtained by SCCA `1, PMD, CCA EN, and SCCA PD.

SCCA `1 with PMD with CCA EN with

SCCA PD µx = µ, µy = 10µ candidates for c1, c2 candidates for λx, λy

µ = 5 µ = 10 0.5:0.01:1 0.01:0.01:0.5 0:0.01:0.5 0.5:0.01:1 SRBCT c1 3.0000 3.0000 3.0000 3.0000 3.0000 3.0000 3.0000 c2 3.0000 3.0000 2.7473 2.6552 2.8114 1.8898 2.7784 c3 2.7846 2.7846 2.7846 2.7846 2.7846 2.7846 2.7846 c4 2.6265 2.6202 2.4658 2.2657 2.4813 1.7796 2.4004 c5 98.64 98.64 35.67 99.70 30.39 80.95 99.39 c6 41.67 41.67 75 75 0 0 75

c7 4.0122e-6 2.8706e-6 5.3604e-1 7.5417e-1 1.2411e-1 5.8078e-1 1.5976e-1 c8 4.0975e-6 7.5397e-6 4.1404e-1 5.3127e-1 7.3770e-16 1.2208e-2 4.1404e-1

c9 93.5484 93.5484 87.0968 87.0968 93.5484 83.8710 100 Brain c1 4.0000 4.0000 4.0000 4.0000 4.0000 4.0000 4.0000 c2 4.0000 4.0000 3.7513 3.7851 3.7957 3.8060 3.3233 c3 3.3198 3.3198 3.3198 3.3198 3.3198 3.3198 3.3198 c4 2.6198 2.6179 3.0767 3.0352 3.2804 3.3287 2.2795 c5 99.64 99.64 20.54 84.94 33.54 69.45 99.81 c6 20 20 30 75 0 0 75

c7 4.7467e-6 1.4462e-6 3.8263e-1 4.3716e-1 5.2609e-2 8.8121e-2 2.4601e-1 c8 7.6711e-6 4.5569e-67 1.2352e-1 3.9253e-1 6.7958e-3 1.2938e-2 5.8152e-1 c9 80.9524 80.9524 90.4762 76.1905 90.4762 90.4762 61.9048

there is no mathematical justification for selecting parameters for other three algorithms PMD, CCA EN and SCCA PD.

2. For Algorithm2, the sparse solution Wx gives a comparable classification accu-

racy compared with the non-sparse solution WN S

x . But, the classification per-

formance of algorithms PMD and CCA EN is highly affected by the parameters involved. When we applied PMD with c1 and c2 selected from 0.5 : 0.01 : 1

to the Prostate dataset the obtained accuracy is much less than that obtained by applying PMD with c1 and c2 selected from 0.01 : 0.01 : 0.5 to the same

dataset. However, a contrast phenomenon is observed when applying PMD to the Brain dataset. Similar phenomenon can be observed for CCA EN.

3. An important advantage of Algorithm 2 over other three algorithms PMD, CCA EN and SCCA PD is that sparse solutions Wx and Wy computed by Al-

gorithm 2 satisfy the orthogonality constraints (4.9) very well. Orthogonality constraints (4.9) imply that both canonical variables XT_W

xand YTWy are mu-

tually uncorrelated. According to the experimental results obtained by Algo- rithm2we see that kWxTXX_√TWx−IlkF

l = O() and

kWT

yY YT_√Wy−IlkF

l = O(), which

are consistent with error bounds (4.7) and (4.8). However, for other three algorithms PMD, CCA EN and SCCA PD, we find that kWxTXX_√TWx−IlkF

l = O(1)

and kW

yY YT_√Wy−IlkF

l = O(1), when l > 1. In addition, although the normal-

ization diag(WT

xXXTWx) = I and diag(WyTY YTWy) = I are performed, the

canonical variables XTWx and/or YTWy are not mutually uncorrelated.

4. The correlation Trace(WT

x XYTWy) for training data (c2) achieved by Al-

gorithm 2 is greater than that achieved by other three algorithms, and the correlation Trace(WT

xXYTWy) for testing data (c4) achieved by Algorithm

2 is greater than that achieved by other three algorithms in Lymphoma and SRBCT data sets. The correlation Trace(WT

xXYTWy) for testing data (c4)

achieved by Algorithm 2 is lower than that achieved by some of other three algorithms in Prostate and Brain data sets, but Algorithm2computes Wx and

4.3 Numerical Results 93 Wy with higher sparsity in these cases.

In document Sparse Dimensionality Reduction Methods: Algorithms and Applications (Page 103-109)