** 12 for the 12 richest countries,**

**2 The two names associated with the development of the technique are K Pearson (1901) whose work on correlation analysis was the basis for**

If the two variables are perfectly correlated, the observations

will all be situated on a straight line (going through the first and the

third quadrants,if it is a positive correlation, and the second and fourth

quadrants, if it is a negative correlation). Hence, this line describes

perfectly the original structure which is thus reduced from two to one

dimensions. (See Figure 7.2.)

Figure 7.2

Finally, if the two variables are positively or negatively

correlated to some extent, but not perfectly, the swarm of observations

will be scattered in an elliptical region as in Figure 7.3 and the

direction of the major axis of the ellipse, PCI, will be governed by the

sign of the correlation.

If the observations are measured on the major axis of this

ellipse (PC I), this new measurement will capture the largest portion of

the joint variability of the original distribution of and and thus

offers a first approximation for it. Hence, the original frame of refer

ence, i.e. the X^ and X^ axes, can be replaced by the PC I axis only,

and each observation can then be approximatively described with respect

to PC I alone, thus a simplification of the original structure will have

been achieved. The remaining part of the joint variability is incorpor

ated in the smaller (minor) axis, PC II, of the ellipse where PC II is

orthogonal to PC I. Principal component analysis is thus a method of

deriving new measures along two new axes which are the two axes of the

ellipse in such way that the share of the total variation measured along

the longer axis is maximized. The mathematical approach to PCA may now

be briefly described.

(2) *Mathematical results^*

Assume that a number of random variables X - ,...,X have a multi-

1 m

variate distribution with means y, ,...,y and covariance matrix E. If a

1 * m

sample of n independent observation vectors is drawn from this population,

they form a data matrix X (n x m)

X =

and the estimate of E is the sample covariance matrix S.

The first principal component, PC I, is that linear combination

PC I = a *X ±* + + a , X

ml m

**1**

of the variables whose sample variance is maximised subject to the con

straint of normalisation for the coefficient vector a, .. ... a . It

11 ml

can be proven that '**aml **are elements of the characteristic

vector corresponding to the largest characteristic root A^ of the co-

variance matrix; as the a.- are normalised, the characteristic root is

**ll**
the sample variance of PC I.

The second principal component, PC II, is that linear combin

ation

PC II = a12 Xx + + a X

**m**

of the variables chosen in such manner that the remaining variance is

maximised subject to the normalisation constraint of the a^2 and also

subject to the further constraint that the a ^ and the a ^ are orthogonal.

As a result, the sample variances of the successive components are

additive and sum up to the total variance of the original observations.

The a10 .... a „ are thus the elements of the characteristic vector

12 m2

corresponding to the second characteristic root A^ of the covariance

matrix. The sample variance of PC II is A2 and the total variance of the

system described by our P C ’s is thus

A, + A- + ... + A — tr S

**1** **2** **m**

where tr denotes the trace of the matrix.

Hence the importance of the respective components can be

Xi Ai

appraised as the ratio of ^ o r g -, and is often presented

1 m

in the form of such a percentage.

(**3) ** **Geometrical interpretation**

Going back to the geometrical interpretation, let us assume a

case with three variables, X^, and X^. The cluster of observations

ellipsoidal shape1 with a long axis PC I and two smaller axes PC II and

PC III. The orientation of PC I in relation to the three original axes

can be defined by the three direction cosines of the angles a^, a*^* and

between PC I and successively X^, X^, and X ^ . We thus have

**2** **2** **2**

(cos a^) + (cos a^) + (cos a^) = 1 .

If the longest axis, PC I, passes through the direction of

maximum variance in the points, it can be proven that the cosine defining

the direction of PC I are the elements of the characteristic vectors

corresponding to the largest characteristic root of the observations co-

variance matrix. The other two axes correspond to the contributions of

the second and of the third PC in descending order of the corresponding

characteristic roots. Hence, the principal components (PC) are the

new axes of a rigid rotation of the coordinate system of the original

observations to a new frame of reference corresponding to the directions

of maximal variances. The original variables are thus described in terms

of this new frame of reference.

Finally, it can be proven that the line representing the first

PC is such that the sum of the squared distance between the observations

and their projections on the line is minimised because these distances

refer to the contribution of other components. In this sense, the first

**1**

PC can be interpreted as a line of closest fit for the observations.1

1 In the case of two variables the difference between the two regression

lines and the unique PC line can be readily shown on three graphs.

Regression Lines

**x**

### 2

**= f ( x 1 )**

PC Line

In each case something different is minimised. In the first and second

case we know the direction of the lines where the distances must be

minimised but it is not so in the last case.

In the first case, we minimise

**2 - X 2)2**

in the second case

**I»! - xp**2

and in the third case