3.3 Adequacies
3.3.2 Visual representation
Given that the kth adequacy, γk, is an increasing function of ∥e′kVrV′r∥ with a
minimum value of zero and a maximum value of one, γk is a decreasing function
of the Pythagorean distance between the endpoint of the vectore′kVrV′r emanating
from the origin and the circumference of an r-dimensional unit sphere centred at the origin. It follows that the relative magnitudes of the distances between the endpoints of the vectors{e′kVrV′r}and the circumference of the r-dimensional unit
sphere centred at the origin, are representative of the relative magnitudes of the p adequacies of ther-dimensional PCA biplot. This is illustrated in Figure 3.4 at the hand of the University data set.
0 SAT Top10 Accept SFRatio Expenses Grad
Figure 3.4: A unit circle in the two-dimensional PCA biplot space of the standardised University data set that is centred at the origin together with the projections of the six-dimensional unit vectors, {ek}, onto the biplot space.
Figure 3.4 shows a unit circle in the two-dimensional PCA biplot space of the standardisedUniversity data set that is centred at the origin with a vector emanat- ing from the origin for each of the (standardised) measured variables. The vector representing the kth variable in Figure 3.4 is the projection of the six-dimensional unit vectorek that emanates from the origin onto the two-dimensional PCA biplot
space, k ∈[1∶6]. Upon comparison of the lengths of the vectors representing the variables in Figure 3.4 to the adequacies in Table 3.2, it is evident that the rela- tive magnitudes of the lengths of the vectors in Figure 3.4 represent the relative magnitudes of the corresponding adequacies. For example, the vector reresenting
Expenses, which is represented by the biplot axis with the largest adequacy value, has the greatest length of the six vectors in Figure 3.4 while the vectors represent- ing SAT and Top10, which are represented by the two biplot axes with the two smallest adequacy values, are the two vectors with the shortest lengths of the six vectors. The fact that the difference in the lengths of the vectors representingSAT
and Top10 is almost impossible to see, agrees with the fact that the adequacies of the two corresponding biplot axes are almost the same.
This graphical representation of the adequacies in a unit circle reminds of the correlation monoplot described in Gower et al. (2011). In the same way that the axes of the correlation monoplot are calibrated relative to a length of unity to read off the accuracy with which each of the variables approximates the unit correlation of an exact representation, the vectors emanating from the origin in Figure 3.4 can be extended and calibrated relative to a length of unity such that the adequacies can be read off directly.
Instead of representing the relative magnitudes of the adequacies of thepbiplot axes of ther-dimensional PCA biplot using the unit circle (or sphere) approach as explained above, the relative magnitudes of the adequacies can be represented in the PCA biplot itself. This can be done, using either the interpolative or predictive r-dimensional PCA biplots, by imposing a thick straight line onto the kth biplot axis that stretches from the origin up to the pointe′kVr for k∈[1∶p] - the relative
magnitudes of the lengths of the thickened parts of the biplot axes represent the relative magnitudes of the adequacies. This yields exactly the same visual repre- sentation of the relative magnitudes of the adequacies as the unit circle (or sphere) approach. Given the way in which the interpolative biplot axes are calibrated, this is slightly easier to do using the interpolative PCA biplot. When the PCA biplot is constructed from the (centred and) standardised measurements, then the point e′kVrin ther-dimensional interpolative PCA biplot will be calibrated with the value
one if the biplot axes are calibrated in the same scales as elements of the matrixX or with the value equal to the sum of the kth measured variable’s observed mean and sample standard deviation if the biplot axes are calibrated in the scales of the original observed measurements.
SAT 5 6 7 8 9 10 10 11 12 13 14 15 15 16 17 18 19 20 Top10 −60 −40 −20 0 20 20 40 60 80 100 100 120 140 160 180 Accept −100 −80 −60 −40 −20 0 0 20 40 60 80 100 100 120 140 160 180 200 SFRatio −15 −10 −5 0 5 5 10 15 20 25 25 30 35 40 45 Expenses −60 −50 −40 −30 −20 −10 0 10 20 30 40 50 60 70 70 80 90 100 110 120 130 140 Grad 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 100 105 110 115 120 125 130 135 12 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 (a) 11 12 13 14 SAT 11 12 13 14 Top10 60 80 100 100 Accept 20 40 60 SFRatio 10 15 Expenses 20 30 40 Grad 75 80 85 90 95 (b)
Figure 3.5: (a) The two-dimensional interpolative PCA biplot of the University data set with thick lines the relative lengths of which represents the relative magnitudes of the adequacies of the measured variables; (b) A small section of the interpolative PCA biplot in (a).
Figure 3.5(a) contains the two-dimensional interpolative PCA biplot constructed from the standardised measurements of theUniversity data set with the biplot axes thickened in the way just explained. The biplot axes in Figure 3.5(a) are calibrated in the scales of the original observed measurements and hence each of the biplot axes were thickened from the origin up to the point calibrated with the value equal to the sum of the corresponding variable’s sample mean and standard deviation. A small section of the PCA biplot has been enlarged and reproduced in Figure 3.5(b) to ease the comparison of the lengths of the thickened parts of the biplot axes. It is evident that the relative magnitudes of the lengths of the thickened parts of the biplot axes in Figures 3.5(a) and 3.5(b) exhibit the same pattern as the relative magnitudes of the adequacies in Table 3.2.
In conclusion, the adequacy of a biplot axis is not an appropriate measure to use to assess the predictive ability of the biplot axis. It will be explained in Section 3.4.1 that the adequacy of a biplot axis can however in some circumstances provide useful information about the biplot axis’ ability to reproduce the true measurements of the corresponding variable.