Factor analysis and Principal components analysis

3. Developing a forest management inventory for the non-timber functions

3.5 Deriving interrelationships among ecosystem factors

3.5.2 Factor analysis and Principal components analysis

Factor analysis is an overall term that includes explorative methods used to describe a set of interrelated variables and derive models that can explain the interrelations among the (more than two) variables. In factor analysis all variables are treated equally, that is there are no dependent and independent variables.

Interrelations between variables are in statistical terms expressed with correlation. The more correlated two variables are the more reliably can the behavior of one variable be

predicted from the behavior of the other. Two high correlated variables can however have a great part of information in common, in cases where the behavior of both is affected by a third variable. An example of such variables is tree diameter and tree height which are high with each other correlated but also, within limits, with tree age. The part of correlation between the two variables (diameter and height), which can be attributed to the third variable (age), can be determined by the calculation of the partial correlation coefficient, which is a measure of the correlation between the two variables when the third variable has been kept constant.

In multivariate situations where more than two variables are considered, it can also be assumed that some variables are affected by another X-variable which is the cause of their high correlation and calculate the part of correlation attributed to that variable. In such cases where this part of correlation is sufficiently high, it can be concluded that the X- variable bears the majority of information and the other variables comprise mainly redundant information.

In the same way work also the factor analysis methods. They examine the patterns of variation among a set of variables and construct a new variable (factor) that can explain a great part of common variation. The more correlated the variables of the set are, the more part of the overall variation can be explained by the factor. A factor is hence a new synthetic (theoretic) variable which is most possible correlated with all variables in the set. Calculation of the partial correlations among the variables, keeping the factor constant, results to variation that can not be explained by the factor. To explain this remaining variation a second factor is extracted and the procedure is repeated until all the variation among the original variables has been explained by the new variables - factors.

Generally, a small number of the first extracted factors comprise a great part of the overall variation and they can substitute the old variables without major loss of available information. This is important in situations where the number of variables involved is relative great and the analysis of their relationships with bivariate statistical methods presents practical difficulties. This reduction of the number of variables is a first characteristic of factor analysis.

The high correlation of factors with certain variables provides an insight on the structure of relationships among the original variables. That is, factor analysis ordinates the original variables according to the constellation of their intercorrelations. The ordination can be achieved with a lot of ways, which all result in mathematically accepted solutions. Not all solutions are, however, meaningful or can be explained in terms of real world experience. Therefore, the analyst must search for the solutions which comply with the theoretical contexts of the variables analyzed and which conform with real world experience. This is a second characteristic of the factor analysis methods: they are heuristic, that is, they provide always a mathematical solution but the solution must be meaningfully interpreted by the analyst.

From the variety of methods developed for the extraction of factors principal components analysis (PCA) is a the most widely used and described in multivariate statistics books. The goal in PCA is to embrace the relational structure of the original variables in a small number of factors. Each factor, called principal component, is a linear combination of the original variables. Principal components are not correlated with each other (they are orthogonal) and they explain successively maximum parts of variation among the original variables. That is, the first principal component explains the greatest portion of variation and the last principal component explains the least portion. In applying the principal component analysis, it is expected that the first few principal components will explain a sufficiently high part of variation among original variables, reducing thus the number of variables to deal with and providing new uncorrelated variables for further analyses.

Principal components analysis is a linear transformation of the original variables. The basic model in PCA, written in matrix form, is F = X ⋅V, where

X is the matrix of the original variables (n observations on p variables), F is the matrix of the factor values to be derived, and

V is the matrix of transformation coefficients.

The first requirement in PCA, that the extracted factors are uncorrelated, poses the condition that matrix V must be orthogonal. In mathematical terms it means that matrix V must fulfill the following two conditions:

I V

V ⋅′ = , the product of V with its transpose V' must give the identity* matrix I, and 1

V , the determinant if V must equal 1.

The second requirement in PCA, that the extracted factors explain successively maximum variation of the original variables can be handled if one considers the relationship of variations between factors and original variables. It can be proven that between the variances of matrix X (the original variables) and the variances of matrix F (the linear transformations) the following relation holds:

V D V

D(_F) = ⋅′ (_X)⋅ , where

D(F), D(X) are the SSCP** matrices of the linear transformations and the original variables respectively and

V, V' are the matrix of transformation coefficients and its transpose.

An identity matrix has the elements of its main diagonal equal to 1 and all other elements equal to null.

**_{SSCP: Sums of Squares and Cross-Products. A matrix whose main diagonal elements are the sums of}

squared differences of the individual values from their mean (corresponding to sample variances) and whose off-diagonal elements comprise the sum of cross products (corresponding to sample covariances).

The left part of the equation can be as big as desired, by increasing the values of the elements of V. To overcome this problem, which makes the maximization futile, matrix V must be normalized so as its length equals 1. This requirement is also fulfilled with the condition that the product of pre-multiplication of V with its transpose V' is equal to the identity matrix.

The maximization problem, subject to the two previous conditions, can be solved with the help of Lagrance multipliers. The method results to the formula (D₍_X₎ −λ⋅I)⋅ν =0, from which a system of homogeneous equations can be derived. The values of matrix λ (the Lagrance multipliers) and vector v are unknown in the system of equations. In order for the system to have a non-trivial solution, the matrix in parenthesis must be singular, that is it must have a determinant of null. Thus, the solution to the maximization problem can be achieved by solving the following equation:

0 )

( − ⋅I =

D_X λ

This is the so called "characteristic equation" of the matrix D(X). The computation of the determinants leads to a p-order polynomial, the solution of which gives the values of λ. The λ-values are the "characteristic roots" or "eigenvalues" of the quadrate matrix D(X). It can be easily proved that the eigenvalues are equal to the variance of the extracted principal components. The greater eigenvalue equals the variance of the first principal component, the next great to the second principal component and so on. The sum of all eigenvalues is equal to the sum of the diagonal elements of D(X), which are the sums of

squares of all original variables. Therefore, the extracted components comprise exactly the same variation as the totality of the original variables.

A transformation vector vj can be derived for each eigenvalue λj. Since each vector corresponds to an eigenvalue, the vectors are called eigenvectors. Having compute the eigenvalues, the values of matrix v can be calculated from the equation

0 )

(D₍X₎ −λj ⋅I ⋅νj = , as follows. The eigenvector vj can be determined by the calculation of the adjoint of matrix (D₍_X₎ −λ_j ⋅I). Normalization of the adjoint to the length of 1 results to the desired vector, which meets the condition ν′j⋅νj =1.

The factor values, which are the transformed observations, can then be calculated on the basis of the basic model in PCA by multiplying the original values with the matrix of eigenvectors. Factor values are the values of each observation with respect to the new variables - the principal components.

In most PCA the original variables are z-standardized and in the place of the SSCP matrix the Variance-Covariance matrix is used to derive the eigenvalues and eigenvectors. Since the variance of z-standardized variables equals 1 and the covariance of two z-standardized variables equals their correlation, the variance-covariance matrix is identical to the matrix

of correlations of the original variables. The coordinates of all observations on the new axes Y can then be calculated by the means of the matrix Z of the z-standardized values and the matrix of eigenvectors V of the correlations-matrix using the formula Y =Z⋅V . Each

Yj axis represents a principal component, which explains λj variation. To obtain the factor

values, in this case, the coordinate values Y should be z-standardized.

The correlation between the factor values and the observed values of one variable is called

factor loading and its square, as a square of correlation coefficient, is a measure of the

common variation between the variable and the factor. Factor loadings are thus useful in deriving relationships among variables. Variables which load considerably on a factor are also with each other interrelated and their jointly behavior is considered to be well represented by that factor.

The sum of squared loadings of one variable over all factors gives the proportion of the variation in the variable that can be explained by the factors and it is called communality of the variable. When the correlations-matrix (z-standardized variables) is used for the calculations, then the variance of each variable is equal to 1. The communality therefore, as a proportion of the total variance, can not be greater than 1.

The sum of squared loadings of all variables on a factor gives the variance that the factor explains and it equals λ, the eigenvalue of the factor. The sum of all eigenvalues, as already mentioned, equals the total variance of all variables. The eigenvalues of the factors are therefore useful in the selection of the most important factors on the basis of their contribution to the overall explained variance.

When the variables are z-standardized, as it is usually the case in PCA, the total variance of all variables equals p, the number of variables. A factor with eigenvalue less than 1, that is less than the variance of a single variable, is considered to contribute little to the variance explained and should generally be excluded from the group of the important factors. The eigenvalue criterion for the selection of the appropriate number of factors should be regarded as a rule of thumb. In cases with great number of variables leads this criterion to extraction of too many factors, which are difficult to be interpreted (BORTZ, 1999). The cumulative sum of the variance successively explained by the factors, expressed as percent of the total variance, is also a helpful measure in selecting the number of important factors. Another useful method for selecting the appropriate number of factors is the scree plot, which is a diagram of the eigenvalues. In scree plot the eigenvalues are plotted against their order of magnitude. In the example of figure 8 the first (greatest) eigenvalue is 3.5, the second 2.3, etc. It can be seen that the eigenvalues of the last factors (13 to 4) are increasing at almost the same small amount, while the increase from factor 4 to 3 is substantially greater. Such jumps in the increment of eigenvalues are considered according to scree test (CATTEL, 1966) as decisive in the selection of the number of factors. In the

example of figure 8, the first three factors are therefore considered as important and the remaining are to be ignored.

Rank number 13 12 11 10 9 8 7 6 5 4 3 2 1 E ig env al ues λ 4 3 2 1 0

Figure 8: diagram of the eigenvalues in order of magnitude (scree plot).

In order for a general interpretation of the factors to be feasible, the number of observations (sample size) should fulfill the following prerequisites according to GUADAGNOLI and VELICER (1988, in BORTZ 1999):

− when in the planning phase has been taken care that ten or more variables fall to each expected factor, then a sample size of about 150 observations is sufficient.

− when on each meaningful factor at least four variables load with more than 0.60, then a general interpretation of factors can be done irrespective of sample size.

− also irrespective of sample size is the interpretation of factors in the case that factor loadings of ten to twelve variables greater than 0.40 are.

− when a small number of variables loads slight on the factors, then only a sample size of more than 300 observations should assure an interpretation of the factors.

Form a geometric point of view, PCA performs a rotation of the coordinate system of the original variables. The axes of the new coordinate system correspond to the extracted factors (principal components) and they are so oriented in the p-dimensional space, that they comprehend successively maximum variance of the observations, while at the same time, each axis stands perpendicular to all others.

Figure 9 shows the rotation of the coordinate system for the case of two variables with four observations and two extracted factors. The positions of the four observations on the coordinate system of the factors (PC1 and PC2) correspond to the factor values (z- standardized) of each observation. The endpoint of the vectors of the variables is

determined on the basis of the factor loadings (a21 and a22 for variable number 2). The axes

of the variables (Z1 and Z2), which originally were perpendicular to each other, form now an angle whose cosine equals the correlation between the variables. The projection of the observations points on the axes of the variables gives the z-standardized values of the original variables (Z21 and Z22 for the observation at the outer right part of the graph).

-1,8 -1,4 -1,0 -0,6 -0,2 0,2 0,6 1,0 1,4 1,8 -1,8 -1,4 -1,0 -0,6 0,2 0,6 1,0 1,4 1,8 PC1 PC2 Observations Variables Z1 Z2 _Z 21 Z₂₂ a₂₁ a₂₂

Figure 9: Geometric interpretation of Principal components analysis (adapted from Bortz 1999).

In document Development of an inventory system for non-timber functions of forests in the frame of management inventories: the case of Greece (Page 51-57)