Principal Component Trees - Permutation distribution clustering and structural equation model t

Principal Component Trees (PC Trees) are a special case of SEM Trees using a Principal Component Analysis (PCA) as a template SEM. In the following section, it will become clear that, indeed, only a few modifications are needed to build tree structures with models that describe a PCA. The resulting hierarchical structure describes a partition of the data set with maximal differences with respect to the principal subspaces of the subsets. PCA is briefly reviewed before the extension from SEM Trees to PC Trees is shown.

4.2.1 Principal Component Analysis

PCA is a method that transforms a set of possibly correlated observed variables into a set of uncorrelated variables. The new set of variables is a linear combination of the original variables. The applied mapping is called an orthogonal transformation. The resulting transformed vari- ables are called the principal components and are, by convention, sorted in a way that the ﬁrst variable accounts for the highest variance in the original observations and the last component for the least variance. Therefore, PCA can be used as a method of dimension reduction by removing principal components from a data set that account for small explained variance. The projection onto principal components is found by an eigenvalue decomposition of the covariance matrix of the data set or a singular-value decomposition of the data matrix (cf. Bishop, 2006). In addition, a PCA can also be formulated as SEM. PCA is closely related to the latent factor model because both methods discover sources of common variance across the observed variables. However, in a PCA without reduction of components, the observed variables are assumed to be measured without error and thus, there are no residual error terms in the model. By deﬁnition, in PCA, there is no covariance between latent factors, whereas in latent factor analysis, the covariance between latent factors is often of interest.

The eigenvalue decomposition of a matrix X is given by solving

X = U V U′

where U is a matrix of eigenvectors and V a diagonal matrix of eigenvalues.

PCA is a linear transformation y = Ax with x ∼ N (0, B) and thus an implied covariance matrix of Σ = ABA′ _{which is equivalent to the formulation of the eigenvalue decomposition}

above. In PCA representations with a reduced number of principal components, there will be a residual error which denotes the error of representing the original observations by only the reduced number of principal components.

4.2.2 PCA and Factor Analysis

PCA and factor analysis (FA) with factor models show much resemblance. Both methods hy- pothesize that the observations are linear combinations of a set of latent variables and, based on this assumption, represent reduced rank representations of a data set. The essential conceptual diﬀerence is that in FA, the covariance matrix of the residuals is diagonal and of full rank, i.e., the measurement errors are assumed to be independent and unique to each observed variable, whereas in PCA, the covariance matrix of the residuals is not of full rank and not diagonal, i.e., the measurement error structure is correlated (Velicer & Jackson, 1990). Velicer and Jackson

p1 p2 P3 p4 p5 y1 y2 y3 y4 y5 σ2 p1 σp22 σ 2 p3 σ2_p4 σ2_p5

Figure 4.2.1: A SEM representation of a Principal Components Analysis (PCA). Without additional constraints, this model is under-identiﬁed, that is, it contains more free parameters than degrees of freedom in the empirical data.

(1990) claims that results of PCA and FA are often very similar This claim is contrasted by simulation results of Widaman (1993), who showed that PCA parameter estimates diﬀer consis- tently from the true parameters if a factor model holds perfectly. This author concludes further that the choice between PCA and FA depends on the type of research question and no model is as such superior to the other. Whenever researchers are interested in a simple rank reduction of their data sets, PCA is favorable because it does not tempt the researchers to impose any interpretations to the data, particularly, regarding the observed variables as manifestations of latent entities.

4.2.3 PC Trees are SEM Trees

The SEM representation in Figure 4.2.1 of the PCA has p2 _{free parameters in the structural}

matrix, representing the eigenvectors and another p free parameters for the latent variances, representing the corresponding eigenvalues. This amounts to p2_{+ p free parameters that face}

only p2_{+ p}1

2 degrees of freedom of the sample covariance matrix, that is, the model is under-

identiﬁed. PCA requires a orthonormal transformation matrix. This imposes another p con- straints that the columns of the structural matrix sum up to one, i.e., the basis vectors of the principal subspace have unit length. Furthermore, the basis vectors have to be orthogonal. This imposes linear independence constraints on all pairs of vectors, resulting in additional p2_{+ p}₁

constraints. This suﬃces to identify the model.

It is not trivial to include the above constraints on the parameters into a regular SEM software. Dolan, Bechger, and Molenaar (1999) suggest conceiving a PCA in a SEM setting as a multi- group model in which one group represents the under-identiﬁed structural model of the PCA as illustrated in Figure 4.2.1, and a second dummy model, which is ﬁtted at the same time as a multi-group model, contains the orthonormality constraints.

In OpenMx there is full freedom in the modification of the fitting function to incorporate these constraints. Therefore, I suggest using a modified maximum likelihood fit function to

estimate PCA in SEM, which can be regarded as a penalized maximum-likelihood function

FM LP = FM L+ λ · tr

ΛΛT − IT ΛΛT − I

whereby FM L is the regular maximum likelihood fit function, I is the identity matrix and λ is a parameter that determines the influence of the penalty on the fit function. The penalty has the mere function of assuring the matrix Λ to be orthonormal. The transpose of an orthonormal matrix is its inverse. Therefore, the PCA solution satisfies

ΛΛT _{= I}

and consequently, ΛΛT _{− I determines the element-wise deviation of Λ from fulﬁlling this con-} straint. Thus,ΛΛT _{− I}T _ΛΛT _{− I}_{is the matrix of sums of squares and cross-products and} the trace of this product determines the deviations’ sums of squares. Hence, the penalty represents the least squares error of the orthonormality constraint. Starting values for Λ and for the diagonal entries of the latent variances can be randomly chosen.

It seems reasonable to add another post-processing step to PC Trees. The principal components are usually expected to be presented in an order according to their inﬂuence, e.g., sorted with respect to their eigenvalues, from largest to smallest. Accordingly, the latent factors in a PC Tree are sorted for each model so that the latent variance estimates fulﬁll σ2

i >σj2 for i < j, with σ2

k being the variance estimate for the k-th latent factor.

In principle, it is possible that the optimizer does not converge on the minimum of the penalized likelihood function that features a suﬃciently small value of the penalty. Therefore, in practice, it is necessary to check whether the penalty of the solution that the optimizer returned is small enough, e.g., smaller than a ﬁxed threshold, e.g., 1 × 10−10_{. When the penalty}

is larger, it is assumed that not only numerical reasons led to the deviation from zero, and the optimization process is restarted with a new set of random starting values.

The presented method of re-expressing PCA as SEM allows us to construct PC Trees. These structures essentially predict differences in the principal subspaces based on partitions of a data set. The extension from SEM Trees to practical PC Trees is minimal and subsumes the modified likelihood function, the sorting of the parameter estimates and the numerical checking whether the penalty is close to zero. Within the SEM Tree library, a wrapper function to the generic SEM Tree function is available that constructs the PCA model and the modified likelihood function. This allows the straightforward use of PC Trees without any additional programming effort.

In principle, PC Trees answer two questions, namely, “Do hierarchical group differences exist in the rotation of the principal subspace?” and “Do the variance contributions of the principal axes differ across group hierarchies for a chosen rotation?”. The first question addresses the existence of any significant differences in the model estimates across groups and can therefore discover differences of variance contributions of principal axes and differences of subspace rotations. The second question addresses differences in the captured variance for a chosen rotation and can therefore potentially find subgroups that differ in the amount of variance that is explained by the chosen rotation. The probabilistic formulation of the PCA has the added advantage that it can handle missing values appropriately.

A PCA with all components leaves no degrees of freedom in the ﬁtting process, that is, the resulting tree will be equivalent to a SEM Tree with a freely estimated covariance matrix, which

is commonly referred to as a saturated model. Any imposed structure with the same number of free parameters as degrees of freedom is indiscernible from the freely estimated covariance matrix because the likelihood values will always be equal. As a consequence, the usage of PC Trees only makes sense if a dimensionality reduction is performed, that is, the number of latent components selected is smaller than the number of observed components.

In document Permutation distribution clustering and structural equation model trees (Page 120-123)