FACTOR ANALYSIS - Data Mining Methods And Models Larose DT (2006) pdf

Factor analysis is related to principal components, but the two methods have different goals. Principal components seeks to identify orthogonal linear combinations of the variables, to be used either for descriptive purposes or to substitute a smaller number of uncorrelated components for the original variables. In contrast, factor analysis represents amodelfor the data, and as such is more elaborate.

Thefactor analysis modelhypothesizes that the response vector X1,X2, . . . , Xmcan be modeled as linear combinations of a smaller set ofkunobserved, “latent” random variables F1,F2, . . . ,Fk, called common factors, along with an error term

ε=ε1, ε2, . . . , εk.Speciﬁcally, the factor analysis model is

X−µ

m_×1 =mL×k kF×1 +mε×1 whereX−µ

m_×1

is the response vector, centered by the mean vector; L

m×k

is the matrix offactor loadings, withli jrepresenting the factor loading of theith variable on the

jth factor; F

k×1

represents the vector of unobservable common factors and ε m_×1 the error vector. The factor analysis model differs from other models, such as the linear regression model, in that thepredictor variables F1,F2, . . . ,Fk are unobservable. Because so many terms are unobserved, further assumptions must be made before we may uncover the factors from the observed responses alone. These assumptions are thatE(F)=0,Cov(F)=I,E(ε)=0,and Cov(ε) is a diagonal matrix. See Johnson and Wichern [4] for further elucidation of the factor analysis model.

Unfortunately, the factor solutions provided by factor analysis are not invari- ant to transformations. Two models,X−µ=L F+εandX−µ=(LT) (TF)+ε, whereTrepresents an orthogonal transformations matrix, will both provide the same results. Hence, the factors uncovered by the model are in essence nonunique, without further constraints. This indistinctness provides the motivation for factor rotation, which we will examine shortly.

Applying Factor Analysis to theAdultData Set

Recall theAdultdata set [6] we worked with in Discovering Knowledge in Data: An Introduction to Data Mining[1]. The data set was extracted from data provided by the U.S. Census Bureau. The intended task is to ﬁnd the set of demographic characteristics that can best predict whether or not a person has an income of over $50,000 per year. For this example, we use only the following variables for the purpose of our factor analysis:age,demogweight(a measure of the socioeconomic status of the person’s district),education-num,hours-per-week, andcapnet(=capital gain – capital loss). The training data set contains 25,000 records, and the test data set contains 7561 records. The variables were standardized and the Z-vectors found, Zi =(Xi−µi)/σii. The correlation matrix is shown in Table 1.7. Note that the correlations, although statistically signiﬁcant in several cases, are overall much weaker than the correlations from thehousesdata set above. A weaker correlation structure should pose more of a challenge for the dimension reduction method.

FACTOR ANALYSIS 19

TABLE 1.7 Correlation Matrix for the Factor Analysis Example

age-z dem-z educ-z capnet-z hours-z

age-z 1.000 −0.076∗∗ 0.033∗∗ 0.070∗∗ 0.069∗∗

dem-z −0.076∗∗ 1.000 −0.044∗∗ 0.005 −0.015∗

educ-z 0.033∗∗ −0.044∗∗ 1.000 0.116∗∗ 0.146∗∗

capnet-z 0.070∗∗ 0.005 0.116∗∗ 1.000 0.077∗∗

hours-z 0.069∗∗ −0.015∗ 0.146∗∗ 0.077∗∗ 1.000

∗∗_{Correlation is signiﬁcant at the 0.01 level (two-tailed).} ∗_{Correlation is signiﬁcant at the 0.05 level (two-tailed).}

To function appropriately, factor analysis requires a certain level of correlation. Tests have been developed to ascertain whether there exists sufﬁciently high correlation to perform factor analysis.

r _{The proportion of variability within the standardized predictor variables which} is shared in common, and therefore might be caused by underlying factors, is measured by theKaiser–Meyer–Olkin measure of sampling adequacy. Values of the KMO statistic less than 0.50 indicate that factor analysis may not be appropriate.

r _{Bartlett’s test of sphericity}_{tests the null hypothesis that the correlation matrix is} an identity matrix, that is, that the variables are really uncorrelated. The statistic reported is thep-value, so that very small values would indicate evidence against the null hypothesis (i.e., the variables really are correlated). Forp-values much larger than 0.10, there is insufﬁcient evidence that the variables are not uncorrelated, so factor analysis may not be suitable.

Table 1.8 provides the results of these statistical tests. The KMO statistic has a value of 0.549, which is not less than 0.5, meaning that this test does not ﬁnd the level of correlation to be too low for factor analysis. Thep-value for Bartlett’s test of sphericity rounds to zero, so that the null hypothesis that no correlation exists among the variables is rejected. We therefore proceed with the factor analysis.

To allow us to view the results using a scatter plot, we decide a priori to extract only two factors. The following factor analysis is performed using theprincipal axis factoringoption. In principal axis factoring, an iterative procedure is used to estimate the communalities and the factor solution. This particular analysis required 152 such iterations before reaching convergence. The eigenvalues and the proportions of the variance explained by each factor are shown in Table 1.9. Note that the ﬁrst two factors

TABLE 1.8 Is There Sufﬁciently High Correlation to Run Factor Analysis?

Kaiser–Meyer–Olkin measure of sampling adequacy 0.549 Bartlett’s test of sphericity

Approx. chi-square 1397.824 degrees of freedom (df) 10

SPH SPH

JWDD006-01 JWDD006-Larose November 18, 2005 17:46 Char Count= 0

20 CHAPTER 1 DIMENSION REDUCTION METHODS

TABLE 1.9 Eigenvalues and Proportions of Variance Explained: Factor Analysisa

Initial Eigenvalues

Factor Total % of Variance Cumulative % 1 1.277 25.533 25.533 2 1.036 20.715 46.248 3 0.951 19.028 65.276 4 0.912 18.241 83.517 5 0.824 16.483 100.000 a_{Extraction method: principal axis factoring.}

extract less than half of the total variability in the variables, as contrasted with the housesdata set, where the ﬁrst two components extracted over 72% of the variability. This is due to the weaker correlation structure inherent in the original data.

Thefactor loadings L

m_×kare shown in Table 1.10. Factor loadings are analogous to the component weights in principal components analysis and represent the correlation between theith variable and the jth factor. Notice that the factor loadings are much weaker than the previoushousesexample, again due to the weaker correlations among the standardized variables. The communalities are also much weaker than the housesexample, as shown in Table 1.11. The low communality values reﬂect the fact that there is not much shared correlation among the variables. Note that the factor extraction increases the shared correlation.

In document Data Mining Methods And Models Larose DT (2006) pdf (Page 36-38)