** 12 for the 12 richest countries,**

**II. Discussion**

The description of PCA in the previous section does not imply

any specific assumptions about the form of the distribution of the

population. Hence, the assumption of a multivariate normal distribution

is not essential in the treatment of PCA. However, in order to describe

the large sample properties of the coefficients, the assumption of an

ellipsoidal contour of equal probability density should be made. Indeed,

if tests of hypothesis and construction of confidence intervals for the

population are to be performed, the assumption of normality for the

population becomes crucial.

The second point, whether we should extract the principal com

ponents with the help of the covariance matrix or with the use of the

correlation matrix is more complex. This is a very important consideration

as these two methods will yield for the PC very different results that

are not linked by any simple relation (except when all the variances

are equal). As Kendall (1957) points out "lines of closest fit found by

minimising sums of squares of perpendiculars are not invariant under

changes of scale".

If the original variables are all in the same unit, the co-

variance matrix is a satisfactory approach. However, if the original

variables are expressed in different units, the variables expressed in

the smallest unit of measurement (i.e. the variables presenting the

largest figures) will have the largest variances and will dominate the

first PC. A change in the unit of measurement of any variable will alter

the results.

This is indeed quite troublesome. In an attempt to prevent the

variances from influencing the results by their mere size, the usual pro

cedure is to standardise the data and to extract the principal components

their size will not affect the results but, on the other hand, we trans

form the original structure to obtain arbitrarily similar units of

measurement. Consequently, the meaning of the PC extracted from standard

ised data becomes more farfetched. The variance of the first PC extracted

from a covariance matrix corresponds to the proportion of total variance

of the original data explained by the first PC. The variance of the first

PC extracted from a correlation matrix corresponds to a proportion of

total variance of the standardised data, the latter being equal to the

number of variables as each variance is 1. If our object is to explain

the original data the first approach is indeed more meaningful than the

second. Furthermore, in the case of PC extracted from a correlation

matrix, the sampling theory supporting tests of hypotheses and construction

of confidence intervals, is much more complex (cf. Anderson (1958)).

In conclusion, in case of dissimilar units for the variables,

if we know a transformation that can render the variables commensurable1

in such a way that the transformed variables still retain an economic

meaning in their new form, then we can extract PC from the covariance

matrix of the transformed variables. One transformation which is widely

used in economics and which could satisfy the above conditions is the

logarithmic transformation.2 In our study, we will thus experiment with

both correlation matrices and covariance matrices derived from the original

data and from logarithmically transformed data. However, we must first

offer a general interpretation for PCA and contrast it to factor analysis.

1 We wish to obtain variances of similar magnitude.

2 In many cases, economic variables are more likely to have log log relations than linear relations.

III. PCA and FA - Various interpretations

PCA and FA will be considered in the specific context of our

study, i.e. when dealing with social and economic variables reckoned in

various units of measurement.

The first PC has been defined as the linear combination of the

original variables that accounts for a maximum of their total variance;

the second PC is another linear combination of the original variables

that accounts for a maximum fraction of the residual variance left un

explained by the first component and that is uncorrelated with the first

PC. Each successive PC makes a smaller and smaller contribution. Hence

the PC are naturally ranked by order of importance as measured by the

variance accounted for by each of them.

Furthermore, the square of the coefficients of each variable1

in the following expression

PC I

a ll X 1 + a 21 X 2 + + a Xm

measures the importance of the corresponding variable relative to the

other variables in the corresponding principal component. It is thus

possible to interpret the various principal components by analysing which

variables are more or less important in the linear combination. However,

as the first PC often explains the bulk of the total variance because of

the variance maximisation involved, in practice the first PC is a *general*

factor allocating weights of similar magnitude to all the variables. The

rest of the components are usually referred to as *bipolar* factors, i.e.

contrasting the variables with 4- and - signs. Furthermore, only very few

PC pass the significance tests and one cannot decide beforehand upon the

1 i.e. the square of the elements of the characteristic vector

corresponding to the principal component. We also know that

**2**

### ,

**2**

### ,

**, 2**

number of significant PC to expect.

Indeed, PC is a descriptive technique which does not permit

the making of a decision based on a prior hypothesis. Furthermore, the

principal components are usually difficult to identify. Hence, other

methods were developed in order to extract "a single structure" from the

principal components. These are the various factor analysis approaches,

pioneered by Thurstone (1931) and described by Harman (1960,1967).

These FA methods perform on the PC matrix post-multiplication

by orthogonal matrices (to retain the independence properties) in order

to obtain weights which are in the vicinity of 1 or in the vicinity of

0 and to avoid weights in the range between these two values. Thus, it

is much easier to interpret the factors. A hypothesis is made beforehand

as to the number of factors desired. As a result, the rotated factors

lose the property of being ranked by order of importance in explaining

the total variability. Unfortunately, FA lacks the rigour of PCA. The

PC factorisation is unique because of the maximisations involved at each

step; on the other hand, the methods developed by social scientists to

rotate the PC axes into factors differ considerably, and pure statisticians

have often looked down at these methods.

In conclusion, the choice between PCA and FA should be made

with regard to our ultimate aims, which are twofold.

a) If we want to avoid the problem of multicollinearity by

transforming our original data matrix into independent

components, (as in Part II),the interpretation of these

components is not needed. The important consideration is

how large a proportion of the original variance we wish to

capture. Principal components analysis is the relevant

approach in such circumstances, and we can use more than one

b) If we want to build a single index out of a large number

of variables representing some general concept, (as in Part

III), PCA is again the best method because the first PC is

general as it usually embodies the bulk of the original

variance. Hence, we will be content to use PCA only through

out our analysis.

B

AGGREGATION APPROACH WITH PCA

We have already stated that the main reason for using PCA in

this part is to avoid multicollinearity. It is also of interest to com

pare the results of a method that allots purely statistical weights to

the variables with the results of the method developed in Chapter VI.

This new approach and the new results obtained will now be presented.

I . The new procedure

The first sets of experiments performed use a data matrix

without any missing data and the PCA is carried out using the correlation

matrix. Then measured GDE is regressed against the first few PC. We

include in the regression as many PC as we want without taking into

account whether the PC's are statistically significant or not.1 We thus

obtain predicted values for GDE which are comparable to those obtained

with single regressions in Chapter VI. However, this new method offers

a number of advantages.

1 If a significant root is less than 1, the PC programme automatically

and arbitrarily rejects it as not significant. Mathematically speaking, the PC corresponding to this root still carries some residual inform ation that could improve the fitting of the multiple regression. Hence, we believe that this arbitrary level of significance is not relevant in our specific approach.

As shown above, the multiple regression between GDE and the 17

original intercorrelated variables is plagued with multicollinearity.

But PCA can concentrate most of the information contained in the 17

original variables into a few uncorrelated PC. A multiple regression

between GDE and these few PC will thus possess certain qualities: the

number of independent variables is reduced, these variables are orthogonal,

and each of the original variables is represented to some extent in the

new variables.

Moreover, one could argue that the first PC, offering a

synthesis of the 17 original indicators, is itself a good proxy for GDE.

However, although the first PC contains the bulk of the information em

bodied in the 17 PC, by adding the remaining information included in the

second and in the third PC for instance, we can only improve this proxy

measure of GDE. Such a step is possible only because the multiple re

gression allows us to aggregate the various PC into a single predicted

value for GDE. The results of the first sets of experiments will now be

presented.

II. Results with the balanced sample 2

The first experiments with PCA are carried out with the balanced

sample 2 (37 countries) and with the indication sample 2 (17 demographic,

social and economic variables). The PC are first extracted from the 17

variables, then from the 14 social and economic variables, and finally

from the 6 economic variables. GDE and National Income are then regressed

successively against the first few components.

In the very first experiment, the PC are extracted from the co-

variance matrix and it is obvious that the magnitude of the weights

allocated depends mainly on the size of the unit of measurement. This is

**vector corresponding to the first PC extracted from the six economic**
**variables.**

**Table 7.1**

**Elements of the First Vector Yielded by**
**PCA on the Covariance Matrix of the Raw Data**

**Variable**
**ail**
**.996**
**Range of the**
**Telephone** **20 - 5683**
**Steel** **.113** **6 -** **662**
**Electric Energy** **.013** **.27 -** **116**
**Passenger Cars** **.007** **.10 -** **42**
**Energy** **.001** **.10 -** **10**
**Commercial Vehicles** **.001** **.09 -** **8**

**These results are obviously unsatisfactory. Hence we have to standardise**
**the data and extract the PC from the correlation matrix. These results**
**are presented in Table 7.2.**

**The first PC extracted from the 17 demographic, social and**
**economic variables cannot be identified in a very specific manner. ** **It is**
**general and it allocates rather balanced weights to all the variables**

**(these weights range between .276 and .195). The signs of these weights**
**points to an overall positive correlation with GDE; ** **all the indicators**
**positively correlated with GDE have positive weights and those negatively**
**correlated with GDE have negative weights.1 ** **The first PC is, of course,**
**significant and explains 69% of the total variance (A^/m = *69) of the**
**standardised data. The second PC is also significant and explains 10%**
**of the total variance ** **= *098). ** **It is a bipolar component contrast**
**ing very accurately the social and demographic indicators with the**

**1 We know these results from previous single regressions run between**
**GDE and these indicators.**