Matrix-Variate Probabilistic Model for Canonical Correlation Analysis

(1)

Volume 2011, Article ID 748430,7pages doi:10.1155/2011/748430

Research Article

Matrix-Variate Probabilistic Model for

Canonical Correlation Analysis

Mehran Safayani and Mohammad Taghi Manzuri Shalmani

Department of Computer Engineering, Sharif University of Technology P.O. Box 11155-8639, Tehran 1458889694, Iran

Correspondence should be addressed to Mehran Safayani,[email protected]

Received 29 October 2010; Revised 29 January 2011; Accepted 31 January 2011

Academic Editor: Enrico Capobianco

Copyright © 2011 M. Safayani and M. T. Manzuri Shalmani. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Motivated by the fact that in computer vision data samples are matrices, in this paper, we propose a matrix-variate probabilistic model for canonical correlation analysis (CCA). Unlike probabilistic CCA which converts the image samples into the vectors, our method uses the original image matrices for data representation. We show that the maximum likelihood parameter estimation of the model leads to the dimensional canonical correlation directions. This model helps for better understanding of two-dimensional Canonical Correlation Analysis (2DCCA), and for further extending the method into more complex probabilistic model. In addition, we show that two-dimensional Linear Discriminant Analysis (2DLDA) can be obtained as a special case of 2DCCA.

1. Introduction

Recently, a probabilistic interpretation of statistical dimen-sion reduction algorithms has been proposed by several authors. Tipping and Bishop have derived a latent variable model for principal component analysis (PPCA) and have shown that how the principal subspace of the set of data vectors can be obtained within a maximum likelihood framework [1]. Lawrence has proposed another probabilis-tic model for Principal Component Analysis (PCA); he integrated out the weights and optimized the positions of the latent variables in the q dimensional latent space [2]. Roweis has presented an expectation-maximization (EM) algorithm for PCA. The algorithm allows a few eigenvectors and eigenvalues to be extracted from large collections of high dimensional data [3]. Unlike PCA, which works with a single random vector and maximizes the variance in the projected space, Canonical Correlation Analysis (CCA) works with a pair of random vectors (or in general with a set of m random vectors) and maximizes correlation between sets of projections. In [4], a latent variable model for CCA has been proposed by Bach and Jordan. Other probabilistic models are also known [5–8]. In general,

the probabilistic models have many advantages including the following:

(i) the potential to extending the scope of the methods into the mixture models [9],

(ii) extending the methods so as to handle the missing data values [1],

(iii) automatic model selection can be applied by combin-ing the likelihood with a prior [10],

(iv) Extending the model into supervised or semi super-vised cases [5].

(2)

x

t1

t2

Figure1: Probabilistic graphical model for CCA.

the spatial information in the image structure and reduce the computational cost to a great extent. General Low Rank Approximation of Matrix (GLRAM) [11], Two-Dimensional Canonical Correlation Analysis (2DCCA) [12,13], and Two-Dimensional Linear Discriminant Analsis (2DLDA) [14] are some well-known matrix-based algorithms constructed based on this idea. Some other researchers have applied multilinear algebra and have extended this concept to higher-order tensor data [15–20].

Because of the success of the matrix-based methods, re-cently some researchers have developed probabilistic model for matrix and tensor extensions of PCA [21–24]. However, they do not show the maximum likelihood relationship between their models and corresponding PCA.

Armed with probabilistic principal component analysis [1] and probabilistic canonical correlation analysis [4], in this paper, we propose a matrix-variate factor analysis model that has the property that its maximum likelihood solution extracts the canonical correlation directions of two random matrices. In addition, we show that 2DCCA can be converted to 2DLDA by considering special kind of random matrices. This means that 2DLDA can be interpreted as 2DCCA between appropriately defined random matrices.

The remaining part of the paper is organized as follows: inSection 2, we review CCA and its probabilistic interpre-tation. Two-Dimensional CCA is described in Sections 3, and 4 introduces our probabilistic model and derivation of canonical directions using maximum likelihood esti-mation. The relationship between 2DCCA and 2DLDA is discussed inSection 5. Finally, conclusions are presented in Section 6.

2. Probabilistic CCA

Canonical Correlation Analysis determines the linear rela-tionship between two multidimensional variablest1 ∈ Ê

m

andt2 ∈Ê

n_{. It finds a pair of linear transforms}_u

1and u2

such that correlations between transformed variablesuT1t1an

uT2t2 are maximized. The objective function of CCA can be

written as

arg max u1,u2

covuT

1t1,uT2t2

varuT

1t1

varuT

2t2

. (1)

The solutions to this problem can be obtained asu1=Σ111/2q1

andu2=Σ221/2q2, whereq1andq2contain left-right singular

vectors of (Σ11)

−1/2

Σ12(Σ22)

−1/2

and Σ_{i j} denotes the sample covariance matrix of ti and tj random vectors. A latent variable model for CCA has been proposed in [4] whose graphical model is depicted inFigure 1. The model is defined as follows:

x∼N(0,Id), min{m,n} ≥d≥1,

t1|x∼N(W1x,Ψ1), W1∈Ê

m×d_, _Ψ

10,

t2|x∼N(W2x,Ψ2), W2∈Ê

n×d_, _Ψ

20,

(2)

where we assume that the data is centered. The negative log likelihood of the data is equal to

L= (m+n)N

2 log(2π) + N

2 log|Σ|+ N

2 tr

Σ−1_Σ_, ₍₃₎

where N is the number of the samples, Σ =

W1W1T+Ψ1 W1W2T W2W1T W2W2T+Ψ2

, and Σ =

Σ11Σ12

Σ21Σ22 denotes the (m+n)×(m+n) sample covariance matrix obtained from datat1j,t2j,j=1,. . .,N.

The maximum likelihood solution is given by

W1=Σ111/2Q1dP1d/2,

W2=Σ122/2Q2dP1d/2,

(4)

where the columns of Q1d and Q2d are the first d left-right singular vectors of the matrix (Σ11)

−1/2

Σ12(Σ22)

−1/2

, and Pd is the diagonal matrix containing the singular values of (Σ11)

−1/2

Σ12(Σ22)

−1/2

.

3. 2-Dimensional CCA

The main diﬀerence between classical CCA and 2DCCA lies in the way the data are represented. Unlike classical CCA which uses the vectorized representation, 2DCCA works with the data in matrix representation. Therefore, 2DCCA preserves some implicit structural information among elements of the original images. It also overcomes the singularity problem of scatter matrices resulting from the high dimensionality of vectors [12,13].

2DCCA considers two random matrices T1 ∈ Ê

m1×n1

and T2 ∈ Ê

m2×n2 _{and seeks left transforms} _L

1 ∈ Ê

m1×m_,

L2∈Ê

m2×m_{and right transforms}_R

1∈Ê

n1×n_,_R

2∈Ê

n2×n

such that the following criteria is maximized:

arg max L1,L2,R1,R2

trcovLT1T1R1,LT2T2R2

trvarLT1T1R1

trvarLT2T2R2

. (5)

(3)

At first left transforms L1,L2 are assumed known and the

following covariance matrices are defined:

covT1l,T2l

=Σl

12=

1

NmΣNn=1T1,ln

T2,ln

T ,

varTl

1

=Σl

11=

1 NmΣ

N n=1T1,ln

Tl

1,n

_T

,

varTl

2

=Σl

22=

1 NmΣ

N n=1T2,ln

Tl

2,n

_T

, (6)

whereT1l =T1TL1andT2l =T2TL2are left projected sample

matrices. Then, the formula (1) becomes

arg max R1,R2

trRT1Σl12R1

trRT1Σl11R1

trRT2Σl22R2

. (7)

The optimal projection can be obtained as follows:

R1=

Σl

11 −1/2

Ql1,

R2=

Σl

22 ₋1/2

Ql

2,

(8)

whereQ1l andQl2containnfirst left-right singular vector of

(Σl11)−1/2Σl12(Σl22)−1/2.

Alternatively, we can rewrite (5) as

arg max L1,L2

trLT

1Σr12L1

trLT

1Σr11L1

trLT

2Σr22L2

, ₍₉₎

where,

covTr

1,T2r

=Σr

12=

1 NnΣ

N n=1T1,rn

Tr

2,n

_T

,

varTr

1

=Σr

11=

1

NnΣNn=1T1,rn

Tr

1,n

_T

,

varTr

2

=_Σr

22=

1

NnΣNn=1T2,rn

Tr

2,n

_T

,

(10)

whereT1r =T1R1andT2r =T2R2are right-projected sample

matrices. The optimal solution can be obtained as

L1=

Σr

11 ₋1/2

Qr

1,

L2=

Σr

22 −1/2

Qr₂,

(11)

whereQr1andQr2containmfirst left-right singular vector of

(Σr11)

−1/2

Σr

12(Σr22)

−1/2

. left projections (L1 andL2) and right

projections (R1andR2) are determined by iteratively solving

(7) and (9) until convergence.

4. Matrix-Variate Probabilistic Model for CCA

In this section, we propose an extension of probabilistic canonical correlation model to deal with 2D data. One limitation of probabilistic canonical correlation model is that, in this method, samples are represented by vectors while in computer vision research, data (images) are often matrices, and structural information can be used for improv-ing the conventional model. We show that the estimatimprov-ing the parameters of the proposed model leads to the two-dimensional canonical correlation analysis directions.

We relate the random matrices T1 ∈Ê

m1×n1 _and _T

2 ∈

Ê

m2×n2_{with the latent matrix}_X∈

Ê

m_×_n

as follows:

T1=U1XV1T+Ξ1,

T2=U2XV2T+Ξ2,

(12)

where U1 ∈ Ê

m1×m_, _V

1 ∈ Ê

n1×n_, _U

2 ∈ Ê

m2×m_{, and}

V2∈Ê

n2×n_{are the factor loading matrices.}Ξ

1andΞ2are the

noise sources, and every entry of them follows fromN(0,ψ1)

andN(0,ψ2), respectively. Letθ= {U1,U2,V1,V2,ψ1,ψ2}be

the parameter of the model. The observationsT1andT2are

conditionally independent given the value of latent matrixX; so, we have

P(_T1,T2|X,θ)=P(T1|X,θ)P(T2|X,θ). (13)

Marginal distribution of observed variables is then given by the integrating out the latent variable as

P(T1,T2|θ)=

P(T1|X,θ)P(T2|X,θ)P(X)dX. (14)

Maximum likelihood is one method for setting the values of these parameters which involves consideration of the log probability of the observed data set given the parameters, that is,

L(D1,D2|θ)=lnp(D1,D2|θ)=

N

n=1

lnPT1;n,T2;n|θ

,

(15)

where Di = {Ti;n}Nn=1 and i ∈ {1, 2} consist of N data

matrix. One diﬃculty here is that all the projection matrices {Ui,Vi}2i=1 should be obtained simultaneously and there is

no closed-form solution for it. Therefore, two probabilistic models are proposed so as to obtain each projection direction separately and from an alternating optimization procedure.

We assume that the value ofUi, i = 1, 2 is known and proceed to project the observations over these matrices. The left probabilistic model is defined as

Til=ViXl+Ξli, i=1, 2, (16)

where Til = TiTUi,Xl = XT, and Ξli is the noise in this model. We define the left probabilistic functionP(T1l,T2l) as

the marginal distribution over the latent variables, that is,

PT1l,T2l|θl

=

PT1l,T2l |Xl,θl

(4)

whereθl _{= {}_V i,Ψli}|

2

i=1. The projected observationsT1l and

T2l are conditionally independent given the value of latent

matrixXl_{; so, we have}

PTl

1,T2l|Xl,θl

=

2

i=1

PTl i |Xl,θli

. (18)

One major problem here is that the probabilistic distribu-tions are defined over vectors but in this case observed data are matrices.

Suppose thattil,j∈Ê

ni_{be the}_j_{th column of the projected}

matricesTil ∈Ê

ni×m_{, then the probabilistic function}_P₍_Tl

i) is defined as

PTl i

=Πm j=1p

tl i,j

, i=1, 2. (19)

x-conditional probability distribution overtl

i,j space is given by

ptl i,j|xlj

∼NVixlj,Ψli

, Ψl

i0, i=1, 2, (20)

wherexlj ∈ Ê

n _{is defined as the} _j_{th column vector of}_Xl

and marginal distribution of xj is N(0,I). Therefore, the marginal distribution for the observed data tl

i,j is readily obtained by integrating out the latent variables, giving

ptil,j

∼N0,Vi(Vi)T+Ψli

, i=1, 2. (21)

Suppose that τl

j = [(t1,lj) T

(tl

2,j) T

]T ∈ Ê

(n1+n2)_, _V =

[(V1)T(V2)T]

T ∈ Ê

(n1+n2)×n_, Ψl =

Ψl

1 0

0 Ψl

2 , and Σ l ₌ V VT₊_Ψl_{. Therefore,}_P₍_τl

j) can be obtained as follows:

Pτlj

=N0,Σl, (22)

whereΣl₌_{V V}T₊_Ψl_{. It can be shown that the negative log} likelihood of the left projected data is equal to

L=ΣN n=1Σm

j=1logP

τnl,j

. (23)

After some manipulations, equation (23) becomes

L= (n1+n2)Nm

2 log(2π) + Nm 2 log Σl +1 2Σ N n=1Σm

j=1tr

Σl−1_τl

n,j

τnl,j

T

= (n1+n2)Nm

2 log(2π) + Nm 2 log Σl +Nm 2 tr

Σl−1_Σl _,

(24)

where Σl ₌ ₍₁_/Nm₎_ΣN n=1Σm

j=1τnl,j(τnl,j) T

is the sample covariance matrix of left projected data, and |A| denotes the determinant of matrixA. For log likelihood not become infinite, we assume Σl _0. _{Figure 2(a)} _{depicts the left} probabilistic graphical model.

In this stage, we should maximizeLwith diﬀerentiating with respect to V, Ψl1, and Ψl2, where the solutions is

straightforward. As shown in [4], the solutions can be obtained as

V1=

Σl

11 1/2

Ql

1

Pl1/2₌_Σl

11R1

Pl1/2_,

V2=

Σl

22 1/2

Ql2

Pl1/2=Σl

22R2

Pl1/2,

(25)

where Ql1 and Ql2 are composed of n first left-right

singular vectors of the matrix, (Σl

11)

−1/2

Σl

12(Σl22)

−1/2

with corresponding singular values on the diagonal of the matrix, Pl _∈

Ê

n_×_n

, and the matricesR1 and R2 are composed of

first n canonical directions. Note that the size of matrix (Σl

11)

−1/2

Σl

12(Σl22)

−1/2

isn1×n2 which is much smaller than

the size of the matrix (Σ11)

−1/2

Σ12(Σ22)

−1/2

∈Ê

m1n1×m2n2_in

classical probabilistic CCA.

After computing V1 and V2, the observations are

pro-jected onto these matrices. The right probabilistic model is defined as

Tir=Xr+Ξri, i=1, 2, (26)

where Tir = TiVi. Xr = X is the latent matrix and

Ξr

i represents the noise source in this model. Similar to left probabilistic model, we define tri,j ∈ Ê

mi_{, and} _xr

j ∈

Ê

m _{as the} _j_{th column vector of} _Tr

i and Xr, respectively, where the marginal distribution ofxr

j is N(0,I). Let τrj = [(tr1,j)T(t2,rj)T]

T

∈ Ê

(m1+m2)_, _U = _[(_U

1)T(U2)T]

T ∈

Ê

(m1+m2)×m_,Ψr=

_Ψr

1 0

0 Ψr

2

, andΣr₌_UUT₊_Ψr_{. Therefore,} P(τrj) = N(0,Σr). The negative log likelihood of the right projected data is equal to

L= (m1+m2)Nn

2 log(2π) + Nn

2 log|Σ r_|

+1 2Σ

N n=1Σn

j=1tr

(Σr₎−1

τr n,j

_T

= (m1+m2)Nn

2 log(2π) + Nn

2 log|Σ r_|

+Nn 2 tr

(Σr₎−1_Σ_r

,

(27)

whereΣr₌₍₁_/Nn₎_ΣN n=1Σn

j=1τnr,j(τnr,j)Tis the sample covari-ance matrix of right projected data samples, and assume

Σr _{0. The solution to this optimization can be obtained} as

U1=

Σr

11 1/2

Qr

1(Pr) 1/2₌_Σ_r

11L1(Pr)1/2,

U2=

Σr

22 1/2

Q2r(Pr)1/2=Σr11L2(Pr)1/2,

(28)

where in this caseL1 and L2 contain the first mcanonical

directions,Qr1andQr2are composed ofmfirst left-right

sin-gular vectors of (Σr11)

−1/2

Σr

12(Σr22)

−1/2

, andPr_∈

Ê

m_×_m

(5)

xl j

tl

1,j

tl

2,j

V1

V2

ψl

1

ψl

2

Nm

(a)

xr j

tr

1,j

tr

2,j U1

U2

ψr

1

ψr

2

Nn

(b)

Figure2: Probabilistic graphical model for 2DCCA, (a) left model and (b) right model.

It can be seen that the left and right canonical directions of 2DCCA can be obtained by maximizing the likeli-hood function. Posterior expectations can be obtained as follows:

Exl j|τlj

=Pl1/2 T_LT

1τlj,

Exrj|τrj

=(_Pr₎1/2T

RT1τrj.

(29)

5. Relationship of 2DCCA and 2DLDA

In this section, we show that 2DCCA and 2DLDA [11] are closely related. 2DLDA uses original sample matrices for constructing between-class and within-class covariance matrices. It adapts an iterative algorithm where, in each iteration one projection direction is assumed known, and other projection is obtained by solving generalized eigen-value problem. Let Xlj ∈ Ê

r×c_, _j ₌ _1,_{. . .}_,_N _the_N_image samples which are projected onto the left projection matrix, and L ∈ Ê

r×r_{. These samples are clustered into} _C _classes

with yj ∈ {1· · ·C} class label where ith class has ni data samples. DefineXli ∈ Ê

r_×_c

as the mean of ith class, π as a vector where its ith element is πi = ni/N, and N=ΣC

j=1nj.

Lemma 1. In 2DLDA, between class scatter matrix is obtained as SBl ₌ _MPMT_{, where} _P ₌ _(diag(_π₎ _⊗ _I

r − (π ⊗

Ir)(π⊗I_r)T),Ml=((Xl₁)

T

,. . ., (XlC) T

)∈Ê

c×(r_C₎

,⊗is the kronecker product,diag(π)is a diagonal matrix withπi’s on its

diagonal, andIis the identity matrix.

The proof of lemma is shown inAppendix A. Consider two sets of multivariate data, {T1j = (Xlj)

T

∈ Rc×r

,j = 1,. . .,N}and{T2j =[Q1,. . .,QC]T ∈RCr×r, j=1,. . .,N} which are realizations of random matrices T1 and T2,

respectively. WhereQi=Irify_j=i, and otherwiseQ_i=0_r.

For example, for image matrixX1lwith class labely1=2,

T1

1andT21are defined as follows:

T1

1=

Xl

1 T

, T1

2=

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

0r

Ir

0r

.. .

0r ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

(r_C₎_×_r

, (30)

where 0ris ar×rsquare matrix of all zeros. The following

lemma shows the relationship between two methods.

Lemma 2. 2DCCA finds the optimal correlation directions of

T1 = (Xl)T and T2 = [Q1,. . .,QC]T random matrices by

solving the generalized eigenvalue problemSBlu = (λ/1 − λ)SWl_u_{, where} _SBl _and _SWl _{are between-class and}

within-class covariance matrices, respectively.

The proof is shown inAppendix B. As we know, The right projection vector of 2DLDA is computed using generalized eigenvalue problemSBl_w ₌ _λSWl_w_{, while in}_{Lemma 2}_{, we} proved that the canonical correlation direction of 2DCCA for (T1,T2) is obtained by solving the generalized eigenvalue

problemSBl_u₌₍_λ/₁₋_λ₎_SWl_u_{. These show the relationship} between two methods. Therefore, the proposed probabilistic model can also be used for modeling 2DLDA technique.

6. Conclusion

(6)

showed that matrix-based Linear Discriminant Analysis can be obtained by setting the input random matrices of CCA.

Appendices

A. Proof of

Lemma 1

SBl₌ C

i=1

πi

Xi−X

_T

= C

i=1

πi

Xi

_T

−X C

i=1

πi

Xi

_T

− C

i=1

πi

Xi

XT

+ C

i=1

πi

XXT.

(A.1)

By substitutingX=C

i=1πi(Xi) and

_C

i=1πi=1 in to above equation, we have

SBl= C

i=1

πi

Xi

_T

−XXT. (A.2)

The following equations can be easily obtained:

X=M(π⊗Ir),

C

i=1

πi

Xi

_T

=Mdiag(π)⊗IrMT.

(A.3)

So, we have

SBl₌_M_diag(_π₎_⊗_I

rMT−M(π⊗I_r)(π⊗I_r)TMT

=Mdiag(π)⊗Ir−(π⊗I_r)(π⊗I_r)T

MT

=MPMT_.

(A.4)

B. Proof of

Lemma 2

Joint sample covariance matrix ofT1andT2is computed as

Σ=

⎛

⎝SBl+SWl MP

PMT P

⎞

⎠_. _(B.1)

2DCCA obtains the optimal canonical directions by finding the eigenvectors of Σ−111/2Σ12Σ22−1Σ21Σ−111/2 which by

some computations, we have

Σ−1/2

11 Σ12Σ−221Σ21Σ−111/2=

SBl₊_SWl−1/2_MPMT

×SBl₊_SWl−1/2

=SBl+SWl−1/2SBlSBl+SWl−1/2 (B.2)

which is equivalent to solving the generalized eigenvalue problemSBl_u ₌ _λ₍_SBl₊_SWl₎_u_{which is equal to}_SBl_u ₌ (λ/1−λ)SWl_u_.

Acknowledgment

This paper is partially supported by Iran Telecommunication Research Center (ITRC).

References

[1] M. E. Tipping and C. M. Bishop, “Probabilistic principal component analysis,”Journal of the Royal Statistical Society. Series B: Statistical Methodology, vol. 61, no. 3, pp. 611–622, 1999.

[2] N. Lawrence, “Probabilistic non-linear principal component analysis with Gaussian process latent variable models,”Journal of Machine Learning Research, vol. 6, pp. 1783–1816, 2005. [3] S. Roweis, “Em algorithms for pca and spca,” in Advances

in Neural Information Processing Systems, pp. 626–632, MIT Press, Cambridge, Mass, USA, 1998.

[4] F. Bach and M. Jordan, “A probabilistic interpretation of canonical correlation analysis,” Tech. Rep. 688, University of California, Berkeley, Calif, USA, 2005.

[5] S. Yu, K. Yu, V. Tresp, H. P. Kriegel, and M. Wu, “Supervised probabilistic principal component analysis,” inProceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’06), pp. 464–473, ACM Press, August 2006.

[6] Y. Zhang and D.-Y. Yeung, “Heteroscedastic probabilistic linear discriminant analysis withsemi-supervised extension,” inLearning and Knowledge Discovery in Databases, vol. 5782 ofLecture Notes in Computer Science, pp. 602–616, 2009. [7] Z. Zhao, L. Sun, S. Yu, H. Liu, and J. Ye, “Multiclass

probabilistic kernel discriminant analysis,” inProceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI ’09), pp. 1363–1368, 2009.

[8] S. Ioﬀe, “Probabilistic linear discriminant analysis,” in Pro-ceedings of the European Conference on Computer Vision (ECCV ’06), vol. 3954 ofLecture Notes in Computer Science, pp. 531–542, 2006.

[9] M. E. Tipping and C. M. Bishop, “Mixtures of probabilistic principal component analyzers,”Neural Computation, vol. 11, no. 2, pp. 443–482, 1999.

[10] C. Bishop, M. Mozer, M. Jordan, and T. Petche, “Bayesian pca,”

Advances in Neural Information Processing Systems, vol. 9, pp. 382–388, 1996.

[11] J. Ye, “Generalized low rank approximations of matrices,”

Machine Learning, vol. 61, no. 1–3, pp. 167–191, 2005. [12] S. H. Lee and S. Choi, “Two-dimensional canonical correlation

analysis,”IEEE Signal Processing Letters, vol. 14, no. 10, pp. 735–738, 2007.

[13] N. Sun, Z. H. Ji, C. R. Zou, and LI. Zhao, “Two-dimensional canonical correlation analysis and its application in small sam-ple size face recognition,”Neural Computing and Applications, vol. 19, no. 3, pp. 377–382, 2010.

(7)

[15] D. Cai, X. He, and J. Han, “Subspace learning based on tensor analysis,” Tech. Rep. UIUCDCS-R-2005-2572, University of Illinois, Champaign, Ill, USA, 2005.

[16] X. He, D. Cai, and P. Niyogi, “Tensor subspace analysis,” in

Proceedings of the Advances in Neural Information Processing Systems (NIPS ’05), pp. 499–506, 2005.

[17] S. Yan, D. Xu, B. Zhang, and H. J. Zhang, “Graph embedding: a general framework for dimensionality reduction,” in Pro-ceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’05), pp. 830–837, June 2005.

[18] S. Yan, D. Xu, Q. Yang, L. Zhang, X. Tang, and H. J. Zhang, “Multilinear discriminant analysis for face recognition,”IEEE Transactions on Image Processing, vol. 16, no. 1, pp. 212–220, 2007.

[19] T. K. Kim and R. Cipolla, “Canonical correlation analysis of video volume tensors for action categorization and detection,”

IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 8, pp. 1415–1428, 2009.

[20] M. Safayani and M. M. Shalmani, “Heteroscedastic multilinear discriminant analysis for face recognition,” inProceeding of the International Conference on Pattern Recognition (ICPR ’10), pp. 4287–4290, 2010.

[21] X. Xie, S. Yan, J. T. Kwok, and T. S. Huang, “Matrix-variate factor analysis and its applications,” IEEE Transactions on Neural Networks, vol. 19, no. 10, pp. 1821–1826, 2008. [22] D. Tao, M. Song, X. Li et al., “Bayesian tensor approach for

3-D face modeling,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, no. 10, pp. 1397–1410, 2008. [23] D. Tao, J. Sun, X. Wu et al., “Probabilistic tensor analysis with

akaike and bayesian information criteria,” in Proceedings of the International Conference on Neural Information Processing (ICONIP ’07), pp. 791–801, 2007.

[24] D. Tao, J. Sun, J. Shen et al., “Bayesian tensor analysis,” in