The traditional (classical) biplot - PCA and CVA biplots : a study of their underlying theory a

Since “there are many patterns and relationships that are easier to discern in graphical displays than by any other data analysis method” (Everitt, 1994), it is always desirable to graphically represent a data set to be investigated and to do so as accurately as possible. Given that humans can only visualise objects which are at most three-dimensional, it is the graphical representation of a data matrix in one, two or three-dimensional space which is usually of interest.

The lower dimensional space in which the data matrix is graphically represented is referred to as the display space. For generality, letr denote the dimension of the display space, 1≤r ≤p. The ordinary r-dimensional MDS configuration associated with a PCA of a data set is ther-dimensional configuration obtained by orthogonally projecting the configuration of points representing the samples comprising a data set in the measurement space onto the best fitting r-dimensional subspace to that configuration. Although thisr-dimensional configuration is optimal in its representation of the samples of the data set, it is lacking in that it does not provide any information on the measured variables of the data set. Gabriel (1971) proposed that a joint representation of the samples and variables of a data set, which he called a biplot, be used to represent the data set. In a biplot each row (i.e. sample) and each column (i.e. variable) of the data matrix is represented by a vector emanating from the origin. These vectors are such that the inner product of a vector representing a row of the data matrix and a vector representing a column of the data matrix is equal to an approximation of the corresponding element of the data matrix. The ‘bi’ in biplot refers to the fact that the biplot is a joint map of two modes, namely observations and variables and does not refer to the dimension of the display. The space in which a biplot is constructed is referred to as the biplot space and will henceforth be denoted byL.

Consider an n×p data matrix X which is centred such that each column has a zero mean, that is such that1′X=0. Let the rank of X be denoted byq,q≤p≤n. The construction of the biplot is based on the fact that anyn×p matrix of rank q can be expressed as the inner product of ann×qmatrix of rankq and aq×pmatrix of rankq (Section 1.6.5 ). It follows that X can be expressed as

X=AB′

from the expression,X=AB′, that every element ofXcan be expressed in the form of the inner product between a row vector ofA and a column vector of B′:

xij =a′ibj ∀(i, j), i∈[1∶n] and j ∈[1∶p] .

Since the row vectors ofAand the column vectors ofB′areq-dimensional, it follows thatX can be perfectly represented byn+p vectors inq-dimensional space. Seeing as

xij =a′ibj

ai =camÐ→xi=cxm

and bj =dbkÐ→x(j)=dx(k)

where c and d are arbitrary constants, the row vectors of A and B can be viewed as representing the rows (samples) and columns (variables) of the matrixX respectively. The ith row of X and the jth column of X can therefore be represented by the ith row vector of A emanating from the origin and the jth row vector of B emanating from the origin, respectively, i ∈ [1∶n], j ∈ [1∶p]. Henceforth the vectors stretching from the origin to the points with coordinate vectors given by the ith row vector of A and the jth row vector of B will be referred to as the ith row marker andjth column marker respectively, i∈[1∶n], j∈[1∶p]. Gabriel suggested that the rows ofXbe represented by the endpoints of the row markers only so that the representations of the rows and columns of X can be easily differentiated in the biplot. The biplot proposed by Gabriel is known as the traditional or classical biplot.

Whenq=r,X can be expressed as the inner product of ann×rmatrix of rankr and anr×pmatrix of rank r, hence Xcan be perfectly represented byn+p vectors inr-dimensional space. Whenq>r, it is however not possible to perfectly represent X in r-dimensional space. What can be done in such a situation is to perfectly represent a rankr approximation, X̂, ofX in ther-dimensional biplot space. Since the rank ofX̂ is equal to r, X̂ can be expressed as

X=AB′

where A is an n×r matrix of rank r and B′ is a r×p matrix of rank r i.e. X̂ can be perfectly represented by n+p vectors in r-dimensional space. Representing the data by means of a traditional biplot therefore in effect models the data as the sum of an inner-product, a′b, which is ‘explained’ by the biplot (i.e. that is contained in the biplot space,L) and a residual ‘error’ term, ǫ, which is not explained by the

biplot (i.e. that is not contained in the biplot space):

xij =a′ibj+ǫ

Ð→xij =xˆij+ǫ (2.6.1)

Consider the following expression of the inner product between two vectors, a and b:

a′b=∥a∥∥b∥cos(θa,b) (2.6.2) Since an inner product is defined in terms of norms and angles, it is important that the aspect ratio of the biplot plotting region be equal to one. If the aspect ratio of the plotting region is not equal to one, the inner products between the row and column markers will still give the correct approximations of the elements of X but the lengths of the row and column markers and the angles between them will appear different to their true values. For this reason, conclusions drawn from the visual inspection of a biplot with an aspect ratio other than one, are likely to be incorrect. The expression of the inner product between two vectors in (2.6.2) highlights the main weakness of the traditional biplot, namely that inner products, through which the traditional biplot approximates the elements ofX, are difficult to visualise. However, there is information about the approximations that can easily be discerned from the mere inspection of the traditional biplot:

1. If two row markers lie in the same direction, i.e. ai =cam, it implies that the

corresponding two row vectors of X̂ are proportional, i.e.xˆi=cˆxm.

2. If two column markers lie in the same direction, i.e. bj =dbk, it implies that

the corresponding two column vectors of X̂ are proportional, i.e.xˆj =dxˆk.

3. If a row marker and a column marker are orthogonal to each other, i.ea′_ibj =0,

the corresponding element of X̂,xˆij, is equal to zero.

Note that the factorization of X̂ into two matrices of rank r is not unique - this is evident from the following expression:

X=AT′(T−1)′B′

whereT′ is a q×q non-singular matrix. Since T is a non-singular matrix, the rank ofAT′ is the same as the rank ofAand also the rank of(T−1)′_B′_{is the same as the}

rank ofB′. It follows thatX̂ can be perfectly represented inr-dimensional space by nrow markers given by thenrow vectors ofAT′and pcolumn markers given by the row vectors ofBT−1_{. In order to see the effect of the non-singular transformations}

performed on A and B on the produced biplot, consider the svd’s of the matrices T′ and T−1_:

T′=GΘH′ T−1 =GΘ−1H′.

Substituting the svd ofT′ forT′ and the svd ofT−1 _for_T−1 _{in the expression of}_X̂

gives

X=AGΘH′HΘ−1G′B′.

Since AT′ = AGΘH′, the transformation from the row markers given by the n row vectors of A to the row markers given by the n row vectors of AT′ consist of a rotation and/or reflection due to G, a dilation (or contraction) and possible reflection due toΘ and another rotation and/or reflection of due to H′. Similarly, the transformation of the column markers given by the p row vectors of B to the column markers given by the p row vectors of BT−1 _{consist of a rotation and/or}

reflection due to G, a dilation (or contraction) and possible reflection due to Θ−1

and another rotation and/or reflection due to H′. The only difference between the transformation, A _Ð→ AT′, and the transformation, B _Ð→ BT−1_{, is the dilation}

or contraction part of the transformation. In the transformation, A _Ð→ AT′, the dilation or contraction (as well as possible reflection) is performed by the diagonal elements of Θ while the diagonal elements of Θ−1_{, which are the reciprocals of the}

diagonal elements of Θ, performs the dilation or contraction (as well as possible reflection) in the transformation, B_Ð→BT−1_{. Due to the fact that the elements of}

a row vector of A are dilated (or contracted) by different values - the jth element of a row vector being dilated (or contracted) by the jth diagonal element of Θ - the angles between the row vectors of AT′ as well as the distances between the points with coordinate vectors given by the row vectors of AT′ will differ from the angles and distances corresponding to the row vectors of A. Collinear row vectors ofAwill be transformed to collinear row vectors ofAT′ - only the distances between the points with coordinate vectors given by the row vectors of AT′ will differ from the corresponding distances between the points with coordinate vectors given by the row vectors of A. The same can be said about the row vectors of BT−1 _{and the row vectors of} _{B. It follows that the relationships between the row}

markers as well as the relationships between the column markers depend entirely on the chosen non-singular transformation. Since the lengths of the row markers, the angles between the row markers and the distances between the endpoints of the row markers are all functions of the inner products between the row markers, the mentioned lengths, angles and distances will be unaffected by a transformation that does not affect the inner products between the row markers. The inner products between the row vectors of A are unaffected by the linear transformation brought

about by multiplication by the matrixT′ (called the transformation matrix) when

AT′TA′=AA′.

It is evident thatT′will satisfy the above equation whenT′T=Ii.e. when the matrix T′, is an orthogonal matrix. When the non-singular matrix, T, is an orthogonal matrix,T′T=I and TT′=I and hence

AT′TA′=AIA′

Ð→AT′TA′=AA′.

Similarly, in order for the lengths of the column markers given by the row vectors of B to be unaffected by the transformation, B _Ð→BT−1_{, the matrix} _T _{must satisfy}

BT−1_T−1′_B′₌_BB′_{. It is evident that the matrix} _T _{will satisfy the above equation}

if and only if (T′T)−1 =I i.e. if and only if T′T=I i.e. if and only if T and hence also T′ is an orthogonal matrix. It follows that if the matrix of transformation T′ is an orthogonal matrix, then the inner products between the row vectors of A as well as the inner products between the row vectors of B, will be unaffected by the transformations,A_Ð→AT′ andB_Ð→BT−1 _{respectively, and hence the properties}

of the row and column markers (namely the lengths of the markers, the angles between the markers and the distances between the endpoints of the markers) will be unaffected by the transformations. Note that when T is an orthogonal matrix, T−1 ₌_T′ _{so that the transformation performed on the column markers is given by}

B_Ð→BT′. Hence, whenT is an orthogonal matrix, the row markers and column markers are transformed in exactly the same way. A transformation which does not affect the properties of the row and column markers is desired when the row and column markers are such that certain aspects of interest are approximated in the produced biplot. If the row and column markers given by the row vectors of A and B respectively produce a biplot with desirable properties in that certain characteristics of interest are approximated in the biplot, then only transformations for which the relationships between the row markers and the relationships between the column markers are unaffected, should be performed, that is only orthogonal transformations should be performed. It is known that an orthogonal transformation (i.e. multiplication by an orthogonal matrix) results in a rotation and/or a reflection being performed. It follows that performing the same rotation or reflection on the row and column markers of an existing PCA biplot will not change the approximation of the data matrix produced by the biplot or the properties of the row and column markers. In order to see this, letQdenote an orthogonal matrix andA∗=AQ′ and B∗=BQ′, then:

Ð→X̂ =AQ′QB′ Ð→X̂ =A∗B∗′ A∗A∗′=AQ′QA′ =AA′ B∗B∗′=BQ′QB′ =BB′.

Since Q is an orthogonal matrix, multiplication by Q performs a reflection and/or a rotation. If ∣Q∣ = 1, multiplication by Q performs a rotation while if ∣Q∣ = −1, multiplication by Q performs either a reflection only or a reflection and a rotation. Note that since Q is a non-singular matrix, the rank of A∗ =AQ′ is the same as that of A and the rank of B∗′=QB′ is the same as that ofB.

In order for the biplot to be used to represent the true relationships between the samples in the measurement space or the true relationships between the variables in the measurement space, specific constraints need to be imposed on the row and column markers. If for example the relationships between the samples in the full measurement space are to be represented by the same relationships between the corresponding row markers in the biplot, it must be true that

XX̂′=AA′. (2.6.3)

Generally however,X̂X̂′=AB′BA′. In order for equation (2.6.3) to hold, the matrix of column markers,B, must therefore satisfy

B′B=I

that is,B must be an orthogonal matrix. Similarly, when the relationships between the variables in the measurement space are to be represented by the same relationships between the corresponding columns markers, the following must be true:

X′X̂ =BB′. (2.6.4)

Given that generally, X̂′X̂ =BA′AB′, it is evident that the appropriate constraint to be imposed in order for equation (2.6.4) to hold, is given by

which implies that A must be an orthonormal matrix. Note that when B′B = I, the inner products between the column vectors of X are represented by the inner products between the column markers in the metric,A′A:

X′X̂ =BA′AB′.

Similarly, when A′A = I, the inner products between the row vectors of X are represented by the inner products between the row markers in the metric,B′B:

XX̂′=AB′BA′.

Since the aim is to represent X as well as possible in ther-dimensional display space, the rankrapproximation,X, tô Xchosen to representXin ther-dimensional display space must be such that some function of the deviations between the elements of X and the elements of X̂ are minimised. An example of such a function, which is not only mathematically tractable but also a logical choice, is the function used by Gabriel (1971) in the construction of the traditional biplot, namely the sum of squared residuals, n ∑ i=1 p ∑ j=1 (xij−xˆij)=tr{(X−X̂)(X−X̂) ′ } =∥X−X̂∥2 .

If the sum of the squared residuals is taken as the measure of lack of fit, then the matrix which will most accurately represent X in r-dimensional space is the best least squares rankrapproximation ofX. It follows that ther-dimensional traditional biplot of a centred data matrix X, is an r-dimensional joint representation of the rows (samples) and columns (variables) of X which perfectly reproduces the best least squares rank r approximation, X̂ = AB′, of X via inner-products between the row and column markers given by the row vectors of the matrices A and B respectively. Consider the svd of X:

X=UDV′.

of X is given by

X=UrDrV′r.

Recall from Chapter 1 that the best rank r approximation to X can be expressed as the orthogonal projection ofX ontoV(Vr):

X=UrDrV′r=XVrVr′ .

Since the row vectors of X̂ are the orthogonal projections of the corresponding row vectors of X onto the column space of Vr, the row vectors of the matrix of

residuals, X−X, are contained in the orthogonal complement of the column spacê ofVr. This means that ther-dimensional traditional biplot space is identical to the

column space of the matrix Vr and the ǫ-term in equation (2.6.1) is contained in

the orthogonal complement of the biplot space.

The matrix X̂, which is of rank r, can be expressed as the inner product of two rankr matrices, A and B′, in the following way:

X=UrDrV′r

Ð→X̂ =(UrDαr)(D1r−αVr′)

Ð→X̂ =AB′

where A = UrDαr, B′ = Dr1−αV′r and alpha is a scale parameter, 0 ≤ α ≤ 1 (see

Section 1.6.5). It will be explained that as the value of α moves from zero to one, the focus in the biplot is shifted from the representation of the columns ofX to the

In document PCA and CVA biplots : a study of their underlying theory and quality measures (Page 80-95)