12 for the 12 richest countries,
2 The two names associated with the development of the technique are K Pearson (1901) whose work on correlation analysis was the basis for
If the two variables are perfectly correlated, the observations
will all be situated on a straight line (going through the first and the
third quadrants,if it is a positive correlation, and the second and fourth
quadrants, if it is a negative correlation). Hence, this line describes
perfectly the original structure which is thus reduced from two to one
dimensions. (See Figure 7.2.)
Finally, if the two variables are positively or negatively
correlated to some extent, but not perfectly, the swarm of observations
will be scattered in an elliptical region as in Figure 7.3 and the
direction of the major axis of the ellipse, PCI, will be governed by the
sign of the correlation.
If the observations are measured on the major axis of this
ellipse (PC I), this new measurement will capture the largest portion of
the joint variability of the original distribution of and and thus
offers a first approximation for it. Hence, the original frame of refer
ence, i.e. the X^ and X^ axes, can be replaced by the PC I axis only,
and each observation can then be approximatively described with respect
to PC I alone, thus a simplification of the original structure will have
been achieved. The remaining part of the joint variability is incorpor
ated in the smaller (minor) axis, PC II, of the ellipse where PC II is
orthogonal to PC I. Principal component analysis is thus a method of
deriving new measures along two new axes which are the two axes of the
ellipse in such way that the share of the total variation measured along
the longer axis is maximized. The mathematical approach to PCA may now
be briefly described.
(2) Mathematical results^
Assume that a number of random variables X - ,...,X have a multi-
variate distribution with means y, ,...,y and covariance matrix E. If a
1 * m
sample of n independent observation vectors is drawn from this population,
they form a data matrix X (n x m)
and the estimate of E is the sample covariance matrix S.
The first principal component, PC I, is that linear combination
PC I = a X ± + + a , X
of the variables whose sample variance is maximised subject to the con
straint of normalisation for the coefficient vector a, .. ... a . It
can be proven that 'aml are elements of the characteristic
vector corresponding to the largest characteristic root A^ of the co-
variance matrix; as the a.- are normalised, the characteristic root is
ll the sample variance of PC I.
The second principal component, PC II, is that linear combin
PC II = a12 Xx + + a X
of the variables chosen in such manner that the remaining variance is
maximised subject to the normalisation constraint of the a^2 and also
subject to the further constraint that the a ^ and the a ^ are orthogonal.
As a result, the sample variances of the successive components are
additive and sum up to the total variance of the original observations.
The a10 .... a „ are thus the elements of the characteristic vector
corresponding to the second characteristic root A^ of the covariance
matrix. The sample variance of PC II is A2 and the total variance of the
system described by our P C ’s is thus
A, + A- + ... + A — tr S
1 2 m
where tr denotes the trace of the matrix.
Hence the importance of the respective components can be
appraised as the ratio of ^ o r g -, and is often presented
in the form of such a percentage.
(3) Geometrical interpretation
Going back to the geometrical interpretation, let us assume a
case with three variables, X^, and X^. The cluster of observations
ellipsoidal shape1 with a long axis PC I and two smaller axes PC II and
PC III. The orientation of PC I in relation to the three original axes
can be defined by the three direction cosines of the angles a^, a^ and
between PC I and successively X^, X^, and X ^ . We thus have
2 2 2
(cos a^) + (cos a^) + (cos a^) = 1 .
If the longest axis, PC I, passes through the direction of
maximum variance in the points, it can be proven that the cosine defining
the direction of PC I are the elements of the characteristic vectors
corresponding to the largest characteristic root of the observations co-
variance matrix. The other two axes correspond to the contributions of
the second and of the third PC in descending order of the corresponding
characteristic roots. Hence, the principal components (PC) are the
new axes of a rigid rotation of the coordinate system of the original
observations to a new frame of reference corresponding to the directions
of maximal variances. The original variables are thus described in terms
of this new frame of reference.
Finally, it can be proven that the line representing the first
PC is such that the sum of the squared distance between the observations
and their projections on the line is minimised because these distances
refer to the contribution of other components. In this sense, the first
PC can be interpreted as a line of closest fit for the observations.1
1 In the case of two variables the difference between the two regression
lines and the unique PC line can be readily shown on three graphs.
2= f ( x 1 )
In each case something different is minimised. In the first and second
case we know the direction of the lines where the distances must be
minimised but it is not so in the last case.
In the first case, we minimise
2 - X 2)2
in the second case
I»! - xp2
and in the third case