GRAPHICAL METHOD OF FACTORING THE CORRELATION

(1)

GRAPHICAL METHOD OF FACTORING THE CORRELATION MA TRIX

BY L. L. THURSTONE

THE PSYCHOMETRICLABORATORY, THEUNIVERSITY OFCHICAGO CommunicatedApril 14, 1944

Multiple factor analysis starts with a square matrix R of order n X n with cell entries

rjk

which are experimentally determined. The matrix R shows the correlation of each variable with every other variable in the test battery. The correlation matrix is symmetric since

rjk

= rki. Some students write unity in the diagonal cells and this is legitimate for some problems. More often the diagonals contain the communalities hk2which are sochosen as to be consistent with the rank of the matrix as determined by the side entries. The factoring methods to be described can be applied for any set of diagonal entries.I

The first object of multiple factor analysis is to find a factor matrix F of order n X r where r is the rank of the correlation matrix

Rjt.

The factor matrix F must be written so as to satisfy the fundamental relationFF' =R. The correlation matrix R can be factoredinto a factor matrix F in many different ways and F is not here unique. The current geometrical inter-pretation of R is that its entries

rjk

showthe scalar products ofpairs of test vectorsJ andK which are not necessarily of unit length unless this restric-tionis imposed by unit diagonalsrjj. Since the correlation coefficients are scalar products, it follows thatno reference frame isimplied by R. Each rowj of the factor matrix F shows the entriesaim whicharethe projections of the test vector J on a set ofr orthogonal reference axes. The lack of uniqueness of Fisgeometricallyinterpretedin the free choice ofan orthog-onal reference frame for the test configuration of n test vectors J. The reference frame is explicitly defined in F relativetothetest vectorsbutit is notdefined in R.

One factorial solution for Fis to locate the first reference axis so as to

maximizethesumof squaresoftestvectorprojections. The second

refer-enceaxiscanbe locatedbythesamecriterionappliedtofirstfactor

correla-tionresiduals, andso onuntil aset of rorthogonal axeshave been

deter-minedwhichcorrespondtorankrofthecorrelation matrix. Inthewriter's

original formulation of thisproblem, it wascalled theprincipal axes solu-tion.2 In the first paper on this problem a computational method was

described forfindingtheprincipal axessolution F. Itinvolvedfirst

factor-ing the correlation matrix intoafactormatrix Fwith anyarbitrary orthog-onal reference frame and a subsequent

orthogonal

transformation repre-senting arigidrotation to theprincipalaxes. That solution involved the

(2)

roots. The next yearHotellingdescribed an iterative method of factoring thecorrelation matrixdirectly into the principalaxessolutionwhich he

re-named principal components.3 Since theiterative methods, as well asall other available methods, of obtaining the principal axes solution are very laborious, it is of interest to students of multiple factor analysis to find shortermethods.

Consider the first column of F for the principal axes solution. Its ele-ments areaji. Itis desiredtofind the values of

aji

so thatthefirstfactor shall account fora maximumof the variance in the correlation matrix R. Then

rjk - ajlakl = Pjk, (1)

where

aji

andaklarethe first factor saturations of testsj and k, andPjk iS the residual to be minimized. Squaring and summing for all correlation coefficients, we have

22rjk- 22Zaj1aklrjk + 22aj12ak12 =

22Pjk2

Z, (2)

jk jk jk jk

where the sum of the squares of thecorrelation residualsPikmaybedenoted

z. Thepartialderivative with respectto

aji

is then

__ =

-4Zrjkakl

+

4aj12akl2

(3)

fromwhich we get

aji2aki2 =2rjkakl. (4)

k k

Let the sum Za2 = t. The summation 2ra can be written as a matrix product Ra. Then

Ra = at (5)

where R is thegivensquare matrix,ais acolumn vector, and tisa scalar, namely, the largest latent root. The iterative process consists in starting withatrialvector u so that

Ru = vZu2 (6)

where (v2u2),ornumbersproportionaltothese,constitute the trialvector u

forthenext iteration. The process continues untilv2u2becomes propor-tionaltothetrial valuesu atwhichtime the values ofv (= u) are the de-sired valuesof

aji

and thefirst column of F is then determined. The

num-ber of iterations can be considerably reducedby starting with R2, or R4, orevenR8. While that looks effective inatextbookexample,it is not so

inviting aprocedure whenwe areconfronted with a correlation matrix of order 70 X 70.

(3)

Consider the matrixmultiplicationRuinequation(6)and, inparticular, the scalar product of the rowj of

Rjk

and the column vector u. This productis

Rju

= VJZU2 (7)

andthis can be written brieflyas

Rju

(8)

butthis is also the well-known

equation

fortheslopeof the regression line

r on u throughthe

origin

for a

plot

of r

against

u. Ifwe plotcolumns of rjk againstthe trial values Uk,wegeta

plot

withnpoints. The best fitting straightlinethroughthe

origin (regression

ron

u)

canbedrawnby inspec-tion and the slope of the line is

easily

read graphically with sufficient accuracy for the first few iterations. That

slope

is the value vj. Toplot, say, 30 points and to draw

by

inspection the best fitting straight line through theorigintakesless time thantoaddcumulatively30 products on

a calculatingmachine.

By equations (5) and

(6)

it is evident that at the solution the

values

vj

=

uj

= aj. Whena setofvalues ofvhasbeendeterminedgraphically, we plotv againstuand find the

slope

m. In the first fewgraphical itera-tions, thisslopewillnotbe

unity

because thetrial valuesuareprobablytoo largeor toosmall. If thetruevaluesarebetterrepresented by ku,inwhich k is astretching

factor,

then a

plot

of r

against

ku will

give

aslope of

v/k

instead ofv. Whenthese

graphs

have been

drawn,

theslopeofv/kagainst kushouldbeunity. Butthe obtained

slope

ofv

against

uismfor the values actuallyused. Hencetheobserved

slope

m = k2.

Having

found theslope

mof the

plot

ofv

against

u,wefindk = x/rn. Then ifweshould take a

newsetoftrial numbers xj =

kuj,

the iteration would

give,

instead of

equa-tion (8),

Rx

zx2 = yy

'(9)

where y = v/k. The

slope

of y

against

x should nowbe

unity

and the valuesof y shouldbe usedasthe trialvectorforthenext

graphical

iteration. The determinationof m and kcan be done with a sliderule and the new trialvaluesy = V/k canalso be determined

by

the sliderule. Theslopem

can befound eitherbyinspection orby simple summing with the method of averagesin whichm =

lv/lu

for like

signed

pairs

ofvandu.

Thegraphical method here described forthe

principal

axes solution has alsobeenadaptedforacentroid method of

factoring

thecorrelation matrix by whichthecomputationsforafactor matrix canbe done

by

a computer

(4)

without excessive labor even for a large matrix and without the use of tabulating machine equipment. That procedure is considerably faster

and it will be described in asubsequent paper.

A numerical example is here given. In table 1 we have a correlation matrix R of order 10 X 10 and rank 3. Thecorresponding factor matrix F for the principal axes is shown in the same table. Itis the solution to be foundfrom R. The latent roots of R are 2.85, 1.47 and 0.63. These are the sums of squares of columns of F. In table 2 theinitial trial vector u for themajor principal factor is taken roughly proportional to the sums of columns of R. Column1 of Risplotted against columnuand the slope is estimated to be approximately +0.05 and it is recorded in column vi. Ten such plots give the values in column

vi.

The slope is read directly from each graph by noting the ordinate of a straight line fit at ul = 1. Nextobtain thetwo columnsumsMu, andZvi. The ratio2v/2u = mand

k = V/m. This summationcanbe absolute sums for like signed pairs ofu

andv. Then computey, = vi/kwithaslide rule. Thesearealso the trial values u2 for thenexttrial. Proceedlikewise for three trials which give the desired values of

aj,

to twodecimals.

The first factor residuals are then computed. These are Pjk =

r-ajlakl.

Chooseas astarting vector that column of the first factor residuals which has thelargest absolute sum, ignoring diagonals. This is column 1.

Itsvalues arerecorded incolumnu1forfactor two. The procedure is now the sameasbefore. The thirditeration was taken on a calculating machine andit gave the second factor loadings

aft,

the second column of F, to two

decimals. Thelast iterations for each factorcan be doneon acalculating machinetoobtain greater accuracywhile the first few iterationscanbe done graphically to save time and labor. The computation for the third column of Fwasdonein the samemanner.

Occasionally, when two or three latent roots are nearly the same, the iterative process Will be foundto oscillateor the convergence will be slow. This is an indication that the test configuration has nearly equal thick-nesses in these dimensions, so that the configuration is nearly circular or spherical inthesedimensions. If the purpose isto extractthe maximum variance from the correlation

matrix,

it doesnotmatterwherewe place a setoforthogonalaxesin those dimensions whicharerepresentedby

nearly

equal latentroots. Insuchasituation it is desirabletoreduce theobtained

vector v to a unitvector in the system withoutwaitingfor complete

con-vergence. Thatcanbedonewithastretching factor pon vsothat pv =a,

whereaarethedesired factorloadings. Thevalue of pcanbe found from the relation

1/p

= V2Y

By

this device the

computer

need not be

unduly delayed byslow convergenceinobtainingafactoriallyuseful solu-tion.

(5)

. .* . . . . 00000000000 CO0 COC0cqO co co

oqoqoq

o o q oo o oo o o o too o ) 5 c CI°SOOo C '- m10COt'-00o oItoto o0 c c Co 0 Co o 5 5C 56C o 1 ** co ooooooooo 11 CO eo V-4 CSN uCQ CQN ' * * * * * . . .6* ;h, I I I 040CIS10o 555ot55 6CD C; C; C o o o o o

o'

io o co o I 5- 55Do c 0 of -7000 2 11 U-'4 '-4000W0 t-0 t- dv c o 0> ,C ;C;C ;C Ci O O o oooC

OIu: o10 0oOee :O0

, cq eq C OCO o555555555c8 44~ ~ ~ ~ ~ ~ ~ -0 C.O 0) C aC COo o CO C o0,-I I I I I I I44 ;. - 0 0to'4 CO rH CC o o o o o o o oo CO I p C I C M Cy k O b s m

°°-

I

I°

ci r -4 a CO CO H'-4 o-4 '-4 55a

1° °

0C CO CS C t I ¢I 4-4; 0 0 0 0) '-4 '4 '-4 '-4 '-4 H c- 410 O oO.. ...5 .,ooooo 1 o o o Oa o a

to

o a so o o COo o o o I 5566 6 *04 ,JH o co otCO CO C o o CCCO d4 1CO COO ''Oo o t H -4 co " c'o

o.

. .. . . I V-4 0 0 0 a 0 0 0 0 a 0 0 010'0 0 * * 1a .a * * 0 000066666o; t _-4 CO t °°00 044 0. .4 U t'. 0 0 cl U.: s, * 00O CO . 00 I^a C)

2

00 co uDCY *

e8

Wm-V-4 N m v ko 0 t-m0 0 1-4 P4 a we

(6)

The graphical methodhere described for estimating inner products can beappliedinavariety ofcomputational problems.

1The author wishes to express his indebtedness to the Carnegie Corporation of

NewYork foraresearch grantinsupport ofourdevelopment of multiple factor analysis andits application to the study of primary mental abilities.

2L.L.Thurstone, "Theoryof Multiple Factors," 1932, pp. 17-27, Edward Brothers, AnnArbor, Mich.

8Harold Hotelling, "Analysisof a complex of statistical variables into principle com-ponents," J.Educ. Psych., 24, 417-441, 498-520,(1933).

THE DEVELOPMENT OF NORMAL AND HOMOZYGO US BRA CHY

(T/T) MOUSE EMBRYOS IN THE EXTRAEMBRYONIC COELOAI OF THE CHICKt

ByS. GLUECKSOHN-SCHOENHEIMER*

DEPARTMENT OFZOOLOGY, COLUMBIA UNIVERSITY

Communicated April 15, 1944

The study of the causal morphology of mammalian embryos in spite of the greatinterest attached to it hasnot progressed very far because of the technical difficulties with which any experimental approach to the problem has met. Waddington and Waterman1 grewrabbit embryos in vitro for a-limited time and Nicholas and Rudnick2 devised a method for raising rat embryosin a culture medium. In spite of these attempts, it

was not possible to operate on mammalian embryos in early stages and let them continue their development inside the uterus, nor could such embryos be raised and develop normally in a suitable extra-uterine

me-dium for anylengthoftime. Methods which would accomplish this would be of importance not only for the study of normal causal morphology of mammalian embryos, but also for anexperimental study of the embryog-eny of certain hereditary abnormalities.

Searching

for such a method, a newprocedure forraising entire mouse embryosoutside the uteruswas

developed and described in detail

(Gluecksohn-Schoenheimer).3

It

con-sisted in removing the embryosfrom the uterusof the mother and

trans-planting them into theextra-embryoniccoelom ofthechickembryowhere they remained and developed for one or several days. This method has been applied in the experiments reported here to the study of

embryos

homozygous for theBrachyurymutation TT.

AsdescribedbyChesley4thehomozygousmutants

(T/T)

showextreme

morphological abnormalities from the age of about 8 days on and die at

about 10days. Theposterior

body

region,

including posterior

limb

buds,

ismissingcompletely andextensive abnormalities are found in