12 for the 12 richest countries,
The description of PCA in the previous section does not imply
any specific assumptions about the form of the distribution of the
population. Hence, the assumption of a multivariate normal distribution
is not essential in the treatment of PCA. However, in order to describe
the large sample properties of the coefficients, the assumption of an
ellipsoidal contour of equal probability density should be made. Indeed,
if tests of hypothesis and construction of confidence intervals for the
population are to be performed, the assumption of normality for the
population becomes crucial.
The second point, whether we should extract the principal com
ponents with the help of the covariance matrix or with the use of the
correlation matrix is more complex. This is a very important consideration
as these two methods will yield for the PC very different results that
are not linked by any simple relation (except when all the variances
are equal). As Kendall (1957) points out "lines of closest fit found by
minimising sums of squares of perpendiculars are not invariant under
changes of scale".
If the original variables are all in the same unit, the co-
variance matrix is a satisfactory approach. However, if the original
variables are expressed in different units, the variables expressed in
the smallest unit of measurement (i.e. the variables presenting the
largest figures) will have the largest variances and will dominate the
first PC. A change in the unit of measurement of any variable will alter
This is indeed quite troublesome. In an attempt to prevent the
variances from influencing the results by their mere size, the usual pro
cedure is to standardise the data and to extract the principal components
their size will not affect the results but, on the other hand, we trans
form the original structure to obtain arbitrarily similar units of
measurement. Consequently, the meaning of the PC extracted from standard
ised data becomes more farfetched. The variance of the first PC extracted
from a covariance matrix corresponds to the proportion of total variance
of the original data explained by the first PC. The variance of the first
PC extracted from a correlation matrix corresponds to a proportion of
total variance of the standardised data, the latter being equal to the
number of variables as each variance is 1. If our object is to explain
the original data the first approach is indeed more meaningful than the
second. Furthermore, in the case of PC extracted from a correlation
matrix, the sampling theory supporting tests of hypotheses and construction
of confidence intervals, is much more complex (cf. Anderson (1958)).
In conclusion, in case of dissimilar units for the variables,
if we know a transformation that can render the variables commensurable1
in such a way that the transformed variables still retain an economic
meaning in their new form, then we can extract PC from the covariance
matrix of the transformed variables. One transformation which is widely
used in economics and which could satisfy the above conditions is the
logarithmic transformation.2 In our study, we will thus experiment with
both correlation matrices and covariance matrices derived from the original
data and from logarithmically transformed data. However, we must first
offer a general interpretation for PCA and contrast it to factor analysis.
1 We wish to obtain variances of similar magnitude.
2 In many cases, economic variables are more likely to have log log relations than linear relations.
III. PCA and FA - Various interpretations
PCA and FA will be considered in the specific context of our
study, i.e. when dealing with social and economic variables reckoned in
various units of measurement.
The first PC has been defined as the linear combination of the
original variables that accounts for a maximum of their total variance;
the second PC is another linear combination of the original variables
that accounts for a maximum fraction of the residual variance left un
explained by the first component and that is uncorrelated with the first
PC. Each successive PC makes a smaller and smaller contribution. Hence
the PC are naturally ranked by order of importance as measured by the
variance accounted for by each of them.
Furthermore, the square of the coefficients of each variable1
in the following expression
a ll X 1 + a 21 X 2 + + a Xm
measures the importance of the corresponding variable relative to the
other variables in the corresponding principal component. It is thus
possible to interpret the various principal components by analysing which
variables are more or less important in the linear combination. However,
as the first PC often explains the bulk of the total variance because of
the variance maximisation involved, in practice the first PC is a general
factor allocating weights of similar magnitude to all the variables. The
rest of the components are usually referred to as bipolar factors, i.e.
contrasting the variables with 4- and - signs. Furthermore, only very few
PC pass the significance tests and one cannot decide beforehand upon the
1 i.e. the square of the elements of the characteristic vector
corresponding to the principal component. We also know that
number of significant PC to expect.
Indeed, PC is a descriptive technique which does not permit
the making of a decision based on a prior hypothesis. Furthermore, the
principal components are usually difficult to identify. Hence, other
methods were developed in order to extract "a single structure" from the
principal components. These are the various factor analysis approaches,
pioneered by Thurstone (1931) and described by Harman (1960,1967).
These FA methods perform on the PC matrix post-multiplication
by orthogonal matrices (to retain the independence properties) in order
to obtain weights which are in the vicinity of 1 or in the vicinity of
0 and to avoid weights in the range between these two values. Thus, it
is much easier to interpret the factors. A hypothesis is made beforehand
as to the number of factors desired. As a result, the rotated factors
lose the property of being ranked by order of importance in explaining
the total variability. Unfortunately, FA lacks the rigour of PCA. The
PC factorisation is unique because of the maximisations involved at each
step; on the other hand, the methods developed by social scientists to
rotate the PC axes into factors differ considerably, and pure statisticians
have often looked down at these methods.
In conclusion, the choice between PCA and FA should be made
with regard to our ultimate aims, which are twofold.
a) If we want to avoid the problem of multicollinearity by
transforming our original data matrix into independent
components, (as in Part II),the interpretation of these
components is not needed. The important consideration is
how large a proportion of the original variance we wish to
capture. Principal components analysis is the relevant
approach in such circumstances, and we can use more than one
b) If we want to build a single index out of a large number
of variables representing some general concept, (as in Part
III), PCA is again the best method because the first PC is
general as it usually embodies the bulk of the original
variance. Hence, we will be content to use PCA only through
out our analysis.
AGGREGATION APPROACH WITH PCA
We have already stated that the main reason for using PCA in
this part is to avoid multicollinearity. It is also of interest to com
pare the results of a method that allots purely statistical weights to
the variables with the results of the method developed in Chapter VI.
This new approach and the new results obtained will now be presented.
I . The new procedure
The first sets of experiments performed use a data matrix
without any missing data and the PCA is carried out using the correlation
matrix. Then measured GDE is regressed against the first few PC. We
include in the regression as many PC as we want without taking into
account whether the PC's are statistically significant or not.1 We thus
obtain predicted values for GDE which are comparable to those obtained
with single regressions in Chapter VI. However, this new method offers
a number of advantages.
1 If a significant root is less than 1, the PC programme automatically
and arbitrarily rejects it as not significant. Mathematically speaking, the PC corresponding to this root still carries some residual inform ation that could improve the fitting of the multiple regression. Hence, we believe that this arbitrary level of significance is not relevant in our specific approach.
As shown above, the multiple regression between GDE and the 17
original intercorrelated variables is plagued with multicollinearity.
But PCA can concentrate most of the information contained in the 17
original variables into a few uncorrelated PC. A multiple regression
between GDE and these few PC will thus possess certain qualities: the
number of independent variables is reduced, these variables are orthogonal,
and each of the original variables is represented to some extent in the
Moreover, one could argue that the first PC, offering a
synthesis of the 17 original indicators, is itself a good proxy for GDE.
However, although the first PC contains the bulk of the information em
bodied in the 17 PC, by adding the remaining information included in the
second and in the third PC for instance, we can only improve this proxy
measure of GDE. Such a step is possible only because the multiple re
gression allows us to aggregate the various PC into a single predicted
value for GDE. The results of the first sets of experiments will now be
II. Results with the balanced sample 2
The first experiments with PCA are carried out with the balanced
sample 2 (37 countries) and with the indication sample 2 (17 demographic,
social and economic variables). The PC are first extracted from the 17
variables, then from the 14 social and economic variables, and finally
from the 6 economic variables. GDE and National Income are then regressed
successively against the first few components.
In the very first experiment, the PC are extracted from the co-
variance matrix and it is obvious that the magnitude of the weights
allocated depends mainly on the size of the unit of measurement. This is
vector corresponding to the first PC extracted from the six economic variables.
Elements of the First Vector Yielded by PCA on the Covariance Matrix of the Raw Data
Variable ail .996 Range of the Telephone 20 - 5683 Steel .113 6 - 662 Electric Energy .013 .27 - 116 Passenger Cars .007 .10 - 42 Energy .001 .10 - 10 Commercial Vehicles .001 .09 - 8
These results are obviously unsatisfactory. Hence we have to standardise the data and extract the PC from the correlation matrix. These results are presented in Table 7.2.
The first PC extracted from the 17 demographic, social and economic variables cannot be identified in a very specific manner. It is general and it allocates rather balanced weights to all the variables
(these weights range between .276 and .195). The signs of these weights points to an overall positive correlation with GDE; all the indicators positively correlated with GDE have positive weights and those negatively correlated with GDE have negative weights.1 The first PC is, of course, significant and explains 69% of the total variance (A^/m = *69) of the standardised data. The second PC is also significant and explains 10% of the total variance = *098). It is a bipolar component contrast ing very accurately the social and demographic indicators with the
1 We know these results from previous single regressions run between GDE and these indicators.