It has become standard practice to use the ratio of the fitted sum of squares to the total sum of squares,
tr{̂XX̂′}
tr{XX′} =
tr{̂X′X̂}
tr{X′X} (3.2.1)
to measure the overall quality of the PCA biplot (see for example Gabriel (1971), Cox and Cox (2001), Gower and Hand (1996), Gardner-Lubbe et al. (2008) and Goweret al. (2011)). Both Type A and Type B orthogonality underlie the validity of the ratio in equation (3.2.1) as a quality measure:
Type A∶ XX′=X̂X̂′+(X−X̂)(X−X̂)′
Ð→tr{XX′}=tr{̂XX̂′}+tr{(X−X̂)(X−X̂)′}
and Type B∶ X′X=X̂′X̂+(X−X̂)′(X−X̂)
Note that since the decomposition ofX intoX̂ and X−X̂ exhibits Type A orthog- onality, tr{̂XX̂′} tr{XX′} =1− tr{(X−X̂)(X−X̂)′} tr{XX′} . (3.2.2)
Hence, using the ratio in (3.2.1) as a measure of the overall quality of the PCA biplot is equivalent to using the ratio
tr{(X−X̂)(X−X̂)′}
tr{XX′} =1−
tr{̂XX̂′}
tr{XX′}
as a measure of the overall loss of information resulting from the dimension reduc- tion.
Let the overall quality of the PCA biplot be denoted by Ω. Letting the svd of X be given by
X=UDV′=UqDqV′q
the overall quality of ther-dimensional PCA biplot can be expressed as:
Ω= tr(̂X ′X̂) tr(X′X) (3.2.3) = tr{VrD2rV′r} tr{VqD2qV′q} (3.2.4) = tr{D2rV′rVr} tr{D2 qV′qVq} = tr{D2r} tr{D2 q} Ð→Ω= ∑ r k=1d2k ∑q k=1d 2 k (3.2.5)
wheredk=[D]kk is thekth largest non-zero singular value of Xor equivalently, the
square root of the kth largest non-zero eigenvalues of X′X and XX′. Since d2
k >0
fork∈[1∶q], the overall quality of the PCA biplot can only take on positive values. Since the numerator ofΩis a non-decreasing function of the dimension of the PCA
biplot, r, while the denominator is fixed, Ω is a non-decreasing function of r. The overall quality of the PCA biplot will therefore necessarily equal its maximum value whenr=p. It is evident from equation (3.2.5) that the maximum value that Ωcan attain is one. The condition r = p is sufficient for Ω to attain its maximum value irrespective of the rank ofX but only necessary when X is of full column rank. It is evident from equation (3.2.5) that the condition which is necessary and sufficient for Ω to equal one is r =q. This is also evident from the expression of the overall quality in equation (3.2.2). Since Ω is a decreasing function of the sum of squared residuals, tr{(X−X̂)(X−X̂)′}= n ∑ i=1 (xi−xˆi)′(xi−xˆi) =∑n i=1 p ∑ j=1 ([X]2ij−[̂X]2 ij) 2
it will attain its maximum value of one if and only if the sum of squared residuals attains its minimum value. It is evident that the sum of squared residuals has a minimum value of zero which it will attain if and only if
[X]ij =[̂X]
ij ∀i∈[1∶n], j∈[1∶p]
←→X̂ =X
←→xi∈ L ∀i∈[1∶n]
←→r=q .
It follows that Ω will attain its maximum value of one if and only if the dimension of the PCA biplot,r, is equal to the rank of X, q.
Being a function of the squared singular values ofX, which are scale dependent quantities, the overall quality of the PCA biplot is itself a scale dependent quantity. When the measured variables have widely differing standard deviations, the overall quality of the PCA biplot constructed from the unstandardised measurements will usually be overly optimistic - this will be explained in Section 3.4.1.3. An example illustrating the scale dependence of the overall quality measure of the PCA biplot will be provided in Section 3.4.1.6.
As a result of the fact that the sample variance associated with
˜xj is proportional
to x′(j)x(j), the total sample variance associated with
˜
x as measured by the one- dimensional measure ∑pi=1varˆ (x˜i), is proportional to tr(X′X). Hence, the overall
quality of the PCA biplot,
Ω=tr{̂X ′X̂} tr{X′X} = ∑p j=1xˆ′(j)xˆ(j) ∑p j=1x′(j)x(j)
can be interpreted as the proportion of the total sample variance associated with
˜
x which is accounted for in the PCA biplot.
If a desired proportion of the total sample variance associated with
˜
x to be accounted for in the PCA biplot has been specified prior to the investigation of the data, any PCA biplot corresponding to an overall quality that is equal to or greater than the desired proportion can be used to investigate the data. Since only one, two and three-dimensional PCA biplots can be visualised, a three-dimensional PCA biplot should be used to investigate the data in the event that the desired proportion of variance to be accounted for in the biplot is greater than the overall quality associated with the three-dimensional PCA biplot. If however the three- dimensional PCA biplot has a very poor overall quality, the investigator should be very careful about drawing conclusions based on the visual inspection of the biplot alone. Later on in this chapter, it will be explained that it is possible for certain individual samples or variables to be accurately represented in a PCA biplot with poor overall quality. In the event that the PCA biplot upon which the investigation of the data set is to be based has poor overall quality, the investigator should only draw conclusions based on those samples or variables that are accurately represented.
Table 3.1: The overall quality of the PCA biplot constructed from the standardised measurements of the University data set corresponding to each possible dimension- ality of the PCA biplot.
Dim 1 Dim 2 Dim 3 Dim 4 Dim 5 Dim 6 0.769 0.900 0.948 0.975 0.996 1.000
As an example, consider the overall quality of the r-dimensional PCA biplot constructed from the standardised measurements of theUniversity data set for every r∈[1∶6] as provided in Table 3.1. It is evident from Table 3.1 that if the desired proportion of the total sample variance associated with
˜
x (where
˜
x is the vector of standardised measured variables) to be accounted for in the PCA biplot is 0.9, the data should be represented in a two or three-dimensional PCA biplot. If on the other hand the desired proportion of total sample variance to be accounted for in the PCA biplot is only 0.75, a one-dimensional PCA biplot will suffice. If the desired proportion of variance to be accounted for in the PCA biplot is greater than
0.948, the data should be represented in a three dimensional PCA biplot, though the desired proportion will not be obtained.
The scree plot associated with X (Cattell, 1966) can also be used to determine an appropriate dimension for the PCA biplot. The scree plot corresponding to the
University data set is given in Figure 3.2. It seems as if the tip of the ‘elbow’ in the scree plot is at the point corresponding to the third eigenvalue which means that the first two principal components accounts for a sufficiently large proportion of the sample variance associated with
˜
xand hence that the two-dimensional PCA biplot will represent the University data set sufficiently accurate.
1 2 3 4 5 6 0 1 2 3 4 Rank of Eigenvalue Eigenvalue
Figure 3.2: The scree plot corresponding to the (standardised) University data set.
Note that since the eigenvalues of X′X are proportional to the eigenvalues of Σ̂ the shape of the scatter plot of
d2
k
∑q
i=1d2i
(3.2.6)
against {k}pk=1 in which the consecutive points are connected by straight lines will look exactly like the shape of the scree plot associated with X. Note that the ratio in (3.2.6) is equal to the relative contribution of the kth dimension of the PCA biplot to the overall quality of ther-dimensional PCA biplot of X, or equivalently the relative contribution of the kth principal component to the overall quality, for k≤rand zero for k>r. The scatter plot of { d
2 k ∑q i=1d 2 i} p k=1 against{k} p k=1 can therefore
be used to determine what the dimension of the PCA biplot should be such that the biplot represents the observed data set sufficiently accurate in the same way that the scree plot is traditionally used to determine the number of principal components to use in an approximation of X. Remember however that the PCA biplot can be at most three-dimensional. Hence, if the ‘tip of the elbow’ in the plot of{ d
2 k ∑qi=1d 2 i} p k=1
against{k}pk=1 indicates that more than three principal components should be used to produce a sufficiently accurate approximation of X, then a three-dimensional
PCA biplot should be used to graphically represent the data set. As mentioned before - if the three-dimensional PCA biplot has a very poor overall quality, the investigator should be very careful when drawing conclusions based on the visual inspection of the biplot alone.
The plot of the overall quality of the PCA biplot against the dimensionality of the biplot space can also be used to determine an appropriate dimension for the PCA biplot to be constructed. Such a scatter plot can be viewed as a cumulative version of the scree plot. The dimension of the PCA biplot which would represent the data sufficiently accurate can therefore be found by identifying the ‘elbow’ that points upwards (instead of the ‘elbow’ that points downwards as in the case of the scree plot). The dimension of the PCA biplot which would represent the data set sufficiently accurate is the dimension corresponding to the point at which the slope of the lines connecting the points in the plot change from steep to not steep. Again, if this dimension is greater than three, a three-dimensional PCA biplot should be used. Since it is a plot of cumulative sums, the ‘elbow’ will usually not be as sharp as in the case of the scree plot and therefore may be better described simply as a decrease in the slope. It is therefore usually easier to determine the appropriate dimension of the PCA biplot from the scree plot or the plot of the relative contributions of the principal components to the overall quality as described earlier.
1 2 3 4 5 6 0.80 0.85 0.90 0.95 1.00
Dimension of PCA biplot
Overall Quality
Figure 3.3: The overall quality of the PCA biplot of the University data set, con- structed from the standardised data, corresponding to each possible dimensionality of the PCA biplot.
Figure 3.3 provides the plot of the overall qualities of the PCA biplots con- structed from the standardised measurements of theUniversity data set against the corresponding dimensionalities of the biplots. In Figure 3.3 the gradients of the lines connecting the dots only really flattens after the point corresponding to the three- dimensional PCA biplot - the slope of the line joining the points corresponding to the overall qualities of the two-and three-dimensional PCA biplots is still relatively steep. This plot therefore indicates that the three-dimensional PCA biplot should be used to represent theUniversity data set.
The problem with the overall quality measure is that it only considers the biplot as a whole and therefore does not necessarily provide accurate information about the quality about the various individual aspects of the biplot. To see this, consider the expression of the overall loss in quality associated with the PCA biplot of a matrix X: tr{(X−X̂)′(X−X̂)} tr{X′X} = ∑n i=1∑ p j=1(xij −xˆij) 2 ∑n i=1∑ p j=1x2ij . (3.2.7)
Note that it is not necessary for each of the terms of the summation in the numerator of equation (3.2.7) to be very small in order for the overall loss in quality to be very small, or equivalently, in order for the overall quality to be very high. Therefore, a very high overall quality does not necessarily imply that every element of the matrix X is accurately approximated in the biplot. Similarly, when the overall quality of the PCA biplot is very low, it does not imply that all elements of the matrixX are poorly approximated in the biplot. It follows that the overall quality of a PCA biplot does not provide information on the quality of the representation of every individual sample or variable. When a particular sample or variable is poorly represented in the PCA biplot, conclusions drawn about that sample or variable based on the visual inspection of the PCA biplot alone, are likely to be erroneous. This emphasises the need for measures of the quality of the representation of the individual samples and variables in the PCA biplot.
Two types of quality measures that focus on individual aspects of the biplot, namely adequacies and predictivities, will be studied in the remainder of this chap- ter. The term ‘adequacy’ was coined by Gardner (2001) although the measure was suggested earlier by Gower and Hand (1996) as a measure of the quality of the representation of the individual measured variables. Gardner-Lubbe et al. (2008) proposed two new quality measures, namely axis predictivities and sample predic- tivities to measure the quality of the representation of the individual variables and samples respectively. It will be explained in Section 3.3 that the adequacy of the rep- resentation of a variable as defined by Gardner (2001) is not a trustworthy measure of the predictive ability of that biplot axis - the axis predictivity of the biplot axis on the other hand, is. It can however be shown that the adequacy of the representation of a variable is a lower bound for the axis predictivity of the corresponding biplot axis. Hence, the adequacy of the representation of a variable can in some circum- stances provide useful information about the predictive ability of the corresponding
biplot axis. For this reason as well as to improve the reader’s understanding of what exactly is measured by the adequacy measure, adequacies will be studied in section 3.3.