An evaluation of cluster analysis and related multivariate techniques for operational research

(1)

City, University of London Institutional Repository

Citation: Newport, R. (1974). An evaluation of cluster analysis and related multivariate techniques for operational research. (Unpublished Doctoral thesis, City University London)

This is the accepted version of the paper.

This version of the publication may differ from the final published version.

Permanent repository link: http://openaccess.city.ac.uk/8585/

Link to published version:

Copyright and reuse: City Research Online aims to make research outputs of City, University of London available to a wider audience. Copyright and Moral Rights remain with the author(s) and/or copyright holders. URLs from City Research Online may be freely distributed and linked to.

City Research Online: http://openaccess.city.ac.uk/ [email protected]

(2)

AN EVALUATION OF 'CLUSTER ANALYSIS

AND RELATED MULTIVARIATE TECHNIQUES

FOR OPERATIONAL RESEARCH

VOLUME 2

(3)

Pa ge

VOLUME, 1 -

(A) _INTRODUCTION ₁

(B) _{MULTIVARIATE ANALYSIS}

1. Methods _{of 14VA} 9

2. Cluster

Analysis

16

3. Ordination

54

4. Seriation

60

5. _{Dissimilarity} _{and Similarity} Measures 69

(C) _{CLUSTER ANALYSIS METHODS}

1. General Discussion 116

2. Explanation _{and Discussion} _{of Methods} 120

3. Comparisons _{of Other} Researches 214

4.. Choice _{of Methods} for Study 226

5. Comparison _{of Methods} 233

6. Conclusions 302

VOLUME 2

(D) _{ORDINATION METHODS}

1. General Discussion 329

2. Explanation _{of Methods} 334

3. Discussion _{and Comparison} _{of Methods} 354

4. Conclusions

394

(E) _{USES IN OPERATIONAL RESEARCH} ₄₀₀

ADDENDA - Operational Research _{Case Studies}

1. Data Investigation 416

(a) Input-Output _Analysis

(b) Stock Market _Data

(c) _{A Manpower Study}

2. Specific Applications 472

ýa _Vehicle _Routing

b _{Sewer Pipes} _Problem

(c _{Team Organizing} _Problem

(d _{Gap Analysis}

(e _Factory _Layout

APPEUDSCES

1. Programs 529

2. Data 546

(4)

LIST OF SYMBOLS USED CONSISTENTLY THROUGHOUT

N, _n _{the number of observations} _{or objects}

M, _m _{the number of variables}

A

S, _s _{any similarity} _measure, _unless _otherwise

specified

S(a, b) _{the similarity} between _{the objects} _{or sets}

aand b

D _{any dissimilarity} _measure

D(a, b) the dissimilarity between _{the objects} _{or sets}

aand b

da dissimilarity _measure, _usually _constrained to

(5)

D

ORDINATION _METHODS

1. General Discussion

2.2ý _{olana tion} _{of Methods}

3. _Dissuasion _{and Comparison} _{of Methods}

(6)

329.

D. 1 GENERAL DISCUSSION

Ordination _methods differ from _clustering in that the

final _product _-a low dimensional _{configuration} _{- is not}

normally an end in itself, and the information which cores

out of this technique is normally from the user's visual

analysis of the ordination. For example the extraction of

labels for dimensions, the discovery _of _sub-groups _within the

objects, etc. _{p are} all left to user control. In cluster

analysis the technique produces an answer, rather than a

simplified question as in ordination.

Despite the apparent _vagueness _{of the interpretation} _of

the end product, the aim of the method is clear _{- to reduce}

dimensionality in a 'best' _way. _{As in cluster} _analysis

which forms 'optimal' groups, the diversity of methods

arises from the definition _{of a 'best'} _way.

Ordination _methods _{are thus} _very _varied, _and-also

difficult _to _categorize _into _types. _Even _the _division _into

metric and non-metric methods is not clear-cut, since methods

can be devised Which _attempt to preserve both the rank order

and the magnitude of the _original distances. However most

current methods may be divided _{on the} basis of metric or non-

metric properties. A metric _method is based on the

numerical sizes of the interpoint distances, Which are-

preserved in a 'best' way, in a lower dimension. A

non-metric method is one which attempts to preserve only the

(7)

330.

further _{categorization} _{of the methods we can suggest} five

classes into _which _{most current} _{methods. may be grouped.}

These are as follows:

Non-metric _methods

1. Stress _minimization

2.. _{Ivianual calculation}

Metric _methods

1. Dimension _elimination

2. Stress _minimization

3. Manual _calculation

Stress _minimization _{methods begin} _with _{a space (usually}

Euclidean)

. of specified dimension and in that space find the

configuration of points _{Which gives} _{an optimal} _{approximation}

of some kind, to the original _points. Stress is the term

used in the method by Kruskal (i964) for _{a particular}

function, _{but here} _{we use it} _{as a general} _{term for} _any

measure _{of distortion} _{of data} _{in producing} _{a lower} dimansioned

space.

Non-metric _stress _minimization _is _often _termed

multidimensional _scaling, _{and we have adopted} Anderson's

(1971) _terminology _{of minimization}

of loss functions for the

metric case. Both these terms _could _easily be applied to

all stress minimization _methods, but-we _shall _continue _with

their _{now conventional} _use. _{In all} these _cases the

ordination procedure is carried out for several dimensions,

and in each case the minimization. is performed by hill-

(8)

331.

The non-metric manual methods are the psychological

scaling _methods, which were the forerunners of the. current

multidimensional scaling techniques. These include the

unfolding methods due to Hays and Bennett (1961). These

methods _are _quite lengthy _{when performed} by hand calculation,

involving _logical _steps leading from _{a raw data} _matrix to _a

final _ordination _which _gives _only _{a rank} _order _of _the

observations on each _axis _{and not} _exact _{co-ordinates,} so

inter-point distances have _{no meaning.} in _the _{configuration.}

This type _{of method} _which _{has both} _non-metric input _and

output is called fully non-metric.

In comparison _{the metric} _{manual methods} _{are more}

heuristic _than their _non-metric _{counterparts.} _This _is

because they lack _{the. rationale} _{of methods based on a}

foundation _{of psychological} _theory, _{as the early} _unfolding

methods had. These metric _methods _often _proceed by

progressively fitting _points into their 'best' _position in a

two-dimensional _space, _{and *do not utilize} _{the full} _distance.

matrix.

Dimension _elimination _methods _{are different} from _other

methods in that they do not try to distort the objects into _a

certain number of dimensions, but _rather _rotate the

configuration of points _{so that} _{a maximum amount of}

information lies _{in as few dimensions} _{as possible.} Thus

these _methods discard dimensions _which _{are of less -value.}

"Principal _components _analysis _{is perhaps} the best _example _of

(9)

332.

no changes in inter-point distance, and the final 'step of

dimension _truncation is left to user control.

Seriation. _methods _which _{we have discussed} in Section B. 4

can be considered as a separate type of ordination method.

As was brought out in the previous discussion on seriati. on,

the limitation to _{one dimension} is dangerously _restrictive,

and the supposition that.

_ time is the major dimension

in

archaeological data can lead to serious error. The only

advantage that seriation possesses is speed, and it could be

that these _methods _would be of some use as'a _quick test, if

they _were _used _with _{a stress} _measure. It is _also _possible

that they _could be used as a good initial _{configuration} for

the stress minimization methods. It should be noted that

seriation _methods normally _proceed from metric input to

non-metric output.

The methods discussed in Section _{B. 4 as pure seriation}

methods _will _{not be analysed} further in this _section, _{as they}

are seen to have little _{value-outside} (and _possibly in)

archaeological dating.

Under the classification discussed _{above we shall}

(10)

333.

Icon-Tie tric

Stress _minimization _{- Shepard/Kruskal}

- Torgerson & Young

- Guttman & Lingoes

- McGee

- Carroll & Chang

Manual calculation _{- Hays Unfolding} Model

Metric

Dimension _reduction

- Principal _-Component Analysis

- Factor Analysis

- Principal Co-ordinate Analysis

Stress _minimization

- Loss functions

Manual _calculation

- Bray & Curtis Orloci

The metric methods _{above have developed} _virtually

independently _under _{the three} _classes

- the dimension

reduction _methods _originally _{in psychology,} _{and the other}

methods _mainly in ecology. _{The non-metric} _methods _{also had}

origins in psychology, _surprisingly _almost _{independently} _of

the dimension _reduction _methods. _{The non-metric}

stress

methods _{are all} fairly _similar _{and arose from} _{the early}

unfolding _work, _{and re-initiated} _{by Shepard' s work in 1952.}

Vie proceed in _{the next section} _{to explain} _{the methods}

(11)

334.

D. 2 E PLANATION OF LETHODS

1. Principal. _component _{and, factor} _analyses

2. Principal _co-ordinate _{analy si s}

3. Bray _{and Curtis'} _method

4. _{Orloci. 's method}

4a. _Alternative _to _Orlocits _method

5. _Hays' _{multidimensional} _unfolding _method

6. _{The Shepard/Kruskal} _method (LID-SCAL)

7. Torgerson _{and Young's} TORSCA

8. _Guttman _{and T ingoe'} _{s wallest} _{'space analysis}

9. _{The elastic} distance _{method of McGee}

10. Carroll _{and Chang's} INDSCAL and LIDPREF models

(12)

335.

ý. Principal Component _{and Factor}

, Analyses

These methods have been discussed earlier. They were

the first _ordination _methods _{to be used and have been applied}

in many fields. The procedures. _{are fast} _{and readily}

available. The methods produce an optimum and give a

measure of fit. The reduction to fewer variables is

virtually _assured by these methods _without _significant loss

of information, 'but the reduction to _say 2 or 3 axes cannot

normally be achieved without large information loss. The

methods do not _attempt to _preserve inter-point distance _{and a}

useful _additioi to the method is to _connect the points by

their _minimu _{i spanning} tree in _order _{to give} _{a better} _picture

of the relationship between _points. This technique _{may be}

employed _with _{any ordination,} but _{seems particularly} _useful

in the _cases _where inter-point _distance is _not _considered in

the optimization procedure. Because _principal _components

produces an ordination very _quickly _{and its} _solution _will be

fairly _similar _{to other} _ordinations, _it is _sometimes _employed

as an initial configuration _which is iteratively improved

upon to attempt to minimize another objective function.

Principal _components _analysis _{and factor} _analysis _{are both}

fully _metric _{- they} _require _metric _input _data _{and produce}

metric _output.

2. Principal Co-ordinates _Analysis

This procedure is due to Gouger (1966,1967a).

it is

very much related to principal _components but more

generalized. It begins _with _{a distance} _matrix _with _elements

dij _{and then forms} _{the matrix} _with _elem3nts

(13)

336.

centred by subtracting row and column means and adding the

overall mean. From this Gower shows that the kth latent

vector of this matrix, normalized _{so that} its sum of squares

th

is equal to the k latent _root _gives _{the co-ordinates} _along

the kth principal axis. The method will thus. give identical

results _with _principal _components _analysis if Euclidean

distance _{is used, but the Gooier procedure} _{can be used with}

any distance measure. In. order for _real _co-ordinates to

exist the centred _matrix _{above must have non-negative} _roots.

This is _{a necessary} _{and sufficient} _condition _which _many

distance _measures _comply _with.

non-metric to this _procedure.

Thus the input _{data may be}

The method has been used by

Rayner (1969) in pedolor.

3. Bray _{and Curtis'.} _Method

This method was an early heuristic _attempt _at

ordination, _arising from the need in ecology to understand

large data _sets. It _{was introduced} _{by Bray and Curtis}

(1957) _{and has been used almost} _exclusively _{in ecology,} _and.

mainly in America. The authors Were forced to reject the

use of factor analysis in their _study _{of forest} _plant life _as

being _{too heavy} _{computationally,} _also _they _{did not wish to}

use correlations because _{of its} _sensitivity _{at low values}

Where error played _{a large} _part.

Their _{method begins} _with _{a distance} _matrix. _{Suppose we}

(14)

337.

1 X

2 ₁₀₀ X-

3 90 75

X

4 80 80 90 X

5. 40 80 '85 30

X

-

6 _{90 25 60 170.175}

x

123

45

6

The furthest _{two points} _{are sele cted. as reference} points and

are placed at either end of the x axis. All other points

are projected onto this line. This can be performed rapidly

using compass and ruler. This, in the above example, gives

the one-dimensional ordination:

oQDC

A second axis is constructed by selecting another pair

of reference points which are close together on the first

axis, but yet still separated by a large distance - z. 9.,

points 3 and 4 in the example. Both of the points are fixed

by their distances from the first two reference _points. The

y axis is constructed perpendicular to the x axis and scaled

so. that the second pair of reference points 3. and 4 are d34

(15)

338.

ci0

co

-

>ýQ

O

Other _points _{are projected} _onto _{the y axis,} _using

compass and ruler by their distances from the second

reference-pair. Of the two possible _positions for _point 5

the one which is chosen depends on the nearness of 5 to

points I and 2; this _gives the position 5 in the above

diagrau. _{The y co-ordinate} is _{the projection} _{of this} _point

on to the y axis. Similarly _other _points such

_{as 6 are.}

given _ay co-ordinate. Az _co-ordinate can be. con$tr-acted

.

in _{a simi, lar manner.} _{The method has the advantage} _that _it

is very fast _{and can, be executed} _manually.

It has, however',

several disadvantages, the major one being that _{even if} the

data is two-dimensional, _unless _{the second reference} _pair.

have _{the same x co-ordinate,} _then _{the ordination} _{does not}

give a true configuration. This _could be rectified by

construction _{of the} _second _axis _parallel _with the line

A-A

joining _{the second}

(16)

339.

Other disadvantages _{are the dependence} on a few reference

points, which, if subject to error, distort the whole

ordination, and there is no measure. of fit to determine how

good an ordination is.

Users _{of the method include} Gittins (1965), Beals (1965),

Gimingham _{et' al (1906)1, Hole} _{and Hironaka} (1960), Kershaw

(1968)

, Swan and Dix (1966) and McIntosh (1905).

4. Orloci' _{s Method}

This _Method (Orloci 1966, Austin _{and Orloci} 1966) is a

modified form, of the Bray and Curtis method but is

mathematically exact. The first axis is obtained. and points

are positioned on this axis in the same way as in the

original paper, which can be stated as:

D1j 2+ D12 2 D2i 2

x_w _1j

2D12

The point furthest from the first axis is chosen. to

initiate _{the y axis.} _{y-co-ordinates-} _{are obtained} from the

formula:

Dj

.2 X1.2 + D132 i32 - D3 '2 + CXl .- 3ý2

Y-

2D 21 j

.

t` 13

3

Similarly _other _{axes can be constructed} (Orloci 1966).

A formula is _also _given for the 'efficiency' _{of the k}th

axis in accounting for the original interstand. distances,

which is simply the ratio of the average inter-point distance

n

on the k axis to the average inter-point distance in the

(17)

340.

4a. Alternative to Orloci's Method

Orlocits _{method gives} _{a perfect} _ordination _of

two-dimensional data, but is heavily _reliant on. the choice of

the first _{2 reference} _points, _since _all _other _co-ordinates in

all dimensions are reliant on these points. This is of

obvious advantage if the position of two points is known

with little error, in which case it would be better to select

those _{two points} in which one has greatest _confidence, _rather

than the _{two most dissimilar} _{as Orloci} _employs.

The proposed method follows Bray and Curtis' original

suggestion more closely than Orloci's and spreads the

'responsibility' _of the _early _reference _points. The first

axis is obtained using the procedure of Orloci, but with an

option to use other reference points if they are considered

more suitable. Two points _are selected for the second axis,

normally by finding that pair for which did _- jajj-I-Ijjj is a

maximum. Suppose these points are 3 and 4 then

ya =0 and y3 :::. (4342 _- (d13-d14 )2)ý. The position of all

other points (including the first two reference points) is

found _{on this} _axis _{in the same manner in which the}

co-ordinates on the first _axis _{were evaluated.} This can be

simply extended to more dimensions, _{and a measure} of fit

similar to Orloci Is may be, used.

5. Hays' Nultidirnensional _Unfolding LIethod

Before

_proceeding

_{to discuss}

_{a particular}

_ordination

method in psychology, it _will be useful to consider the

(18)

341.

one of the oldest areas of measurement and scaling usage.

Coombs (1964) (see also Coombs et al 1970, Green and Carmone

1970) divides-data into four types, firstly _according to

whether they are based on one or two sets of data (i. e. with

one set data _{- comparisons} are only made between stimuli;

with two set data - stimuli _{are rated,} and hence the rater is

under investigation as well as the stimuli), and secondly

according to Nether the emphasis is on proximity or

dominance (i. e. if we ask if A is taller _{than B, or if B is}

taller _than _{A, then} _{we are} _concerned _with dominance _{on a}

certain scale, whereas if we ask if A and B are the' same

height _{or not} then _{we are concerned} _with _proximity). These

differences _{may at first} _{seem subtle,} _{and indeed} _{a set of}

data _{may be classifiable} _{in more than one category,} but if

we consider the normal type _{of data} _obtained in each case the

differences _should _{become clearer.} _{Coombs' classification}

is _{as follows:}

Dominance _matrix _Proximity _matrix

Two sets- Quadrant _Single II: Quadrant I:

stimulus Preferential choice

One set Quadrant _Stimulus III: Quadrant IV:

comparison Similarities

Typical _questions _{to obtain} _{data for} _{each of the}

(19)

342.

QI _{- Preferential} Choice.

preference.

Rank the following in order _of

QII _{- Single} Stimulus. Rate the following _according to

their _possession _{of property} _{x on a scale} _{1' to 5.}

QIlI _{-- Stimulus} _Comparison. _Which _of _{the following} _pair _of

objects do you think _possesses _property _{x the} _most?

QIV _{- Similarities.} _{Which pair} _{is more similar?}

AandBorCandD?

Each type _{of data} _lends _itself _to _its _{ovm particular}

kind _{of analysis..} _{Most of the pre-1960} _methods _are

discussed _{in Coombs (1964)} _{and Torgerson} (1958), _{we list} _them

here _with _their _original _references:

QI _{Parallelogram} _analysis (Coombs 1953)

One dimensional _unfolding (Coombs 1950,1953)

Multidimensional _unfolding (Bennett _{and Hays 1960,}

Hays and Bennett 1901)

Factor _analysis (Coombs and Kao

. 1960a, b)

Qil Guttman's _scalogram _analysis (Guttman _1944,1950)

Lazarfeld's _latent _distance _model (Lazarfeld ₁₉₅₉₎

Categorical _judgment _model (Torgerson ₁₉₅₈₎

QIII _{Law of comparative} _judgment (Thurstone ₁₉₂₇₎

Bradley, Terry, _{Luce model (Bradley} _{and Terry} _1952,

(20)

PAGE

NUMBERING

(21)

343.

QIV Multidimensional _unfolding (Hays 1954, Coombs 1964)

Factor _analysis (Torgerson ₁₉₅₂₎

Shepard _{and Kruskal's} UIDSCAL (Shepard _{1962a, b,}

Kruskal _1964a, b)

Torgerson _{and Young's} TORSCA (Torgerson ₁₉₆₅ Young

and Torgerson 1967, Young 1966)

Guttman _{and Lingoes'} SSAR (Lingoes _{1965, Guttman} 1967,

1968)

McGee's _elastic distances _model (McGee

. 1965,1966,

1967)

Carroll _{acrd Chang' s INDSCAL} (Carroll _{and Chang} ₁₉₇₀₎

Current _research _{in mathematical} _psychology _{is currently}

being directed _{away frota} its _more traditional QII _{and QIII}

data _analyses, _{and towards} _{QI and QIV data,} _{and it} is in

these _areas _where _ordination is _particularly _applicable

(Q± I and QIII data _{may be ordinated,} _but _only _after

conversion to QI or QIV type).

Qae of the first psychological methods for producing

ordinations was that due to Hays. The method is dealt with

at length in Coombs (1964) and Coombs, Dawes and Tversl

(1970). _It _is _essentially _{a hand} _method, _{and can be a}

lengthy _procedure. _{The method} _is _based _{on the} _idea _that the

best _{one dimensional} _{representation} _{of three} _points _will

consist of the two furthest _points and the projection of the

third _point _{on to the line} _connecting _them. _{The procedure}

considers all sets of three _points (called triples) in order

to determine the best _{configuration.} _Firstly _{the most}

dissimilar _pair is found _{by constructing} _{a partial} _order _of

distances (i. e. if a respondent _ranks _{in ' order} _{A-BCD then}

AD < AC < AD, thus from _all _respondents _{a partial} _order _{may be}

(22)

345.

From the similarity matrix _{can be picked} out the rank order

which begins with each of this pair. From these a partial

order of stimuli is obtained. If all triples are satisfied

then this _order is the final _one, _otherwise _{the order} is

selected _which satisfies the most triples. A second

dimension _{can be constructed} _using _all _{the unsatisfied}

triples.

The method is fully non-metric, with both input and

output data being rank orders. This means that any monotone

transformation _could be performed _{on the axes,} _{and so}

relative distances are meaningless on the Hays configuration.

Also, _{even if} Hays' _{method in a particular} _{case violates} _none

of the triples in k dimensions, this does, not mean that a

metric solution can necessarily be obtained with that

dimensionality.

6. _{The Shepard/Kruskal} _Method

The early work of Shepard (1957,1958a,

b, 1950)'was

concerned with the relationship between the _{psychological}

similarity of stimuli _{and the} _{representation} _of this in

metric space. He postulated _{an approximately} _exponential

transformation in _{one particular} _case (Shepard _1958b), but

observed that one had no reason to suppose. that this

relationship held for _other data, _{and indeed} _{one is} _more

often concerned with finding the transformation than _with the

(23)

346.

The method he later proposed (1962a, b) was a lendr ark

in multidimensional scaling, combining non-metric input. with

metric output. The only assumption made about the

transformation function is that it _should be monotonic.

Any n points can be placed in a space of n-1 dimensions so as

to preserve the _rank _order _of the interpoint _similarities

(this _has _been _formally _proved _{by Bennett} _{and Hays} _1960,

. but

if _{we consider} four _points forming _{a tetrahedron} in three-

space, it can be seen that the length of any side can be

shortened to zero or lengthened until it is the longest

without changing the other lengths). The procedure begins

by finding this _n-1 dimensional _problem _{by an iterative}

process. The points are placed equidistant (i. e. at the

vertices of a simplex) in the n-1 space, the 'force' on each

point from the others trying to move it nearer or further to

thaw to _satisfy the _rank _order is _calculated, _{and each} _point

is _{moved in} the direction _of _this _force, _{and so on until} _an

optimum is reached. There is no unique solution, and thus

the additional requirement _{of minimum dimensionality} is

introduced. Principal _component _analysis is _used _{to reduce}

the dimensionality, by ignoring dimensions _which _account for

little _of the _overall _variance, _this _also _gives _{an initial}

configuration in the reduced _space. Iterations* take place

as before' to find an optimum (this may however be a local

optimum). In this smaller _space it will probably be

impossible to _satisfy _the _rank _order _{of distances} _entirely,

and a measure of fit is used _{- the} mean square discrepancy

between the _original _{and derived} _rank _orders.

A

(24)

347.

One of the outputs _{of the method is} (what Icruskal called)

the Shepard diagram _-a _graph _{of similarities} _against derived

distances. This _gives _{a picture} _{of the transformation}

function. The Shepard diagram is _{of use in any ordination}

and can indicate particular points which have not been able

to be placed satisfactorily in the ordination.

Kruskal (1964a, b) improved _{upon Shepard's} _{work by}

putting his method on a more rigorous basis. The. orizinal

dissimilarities _{are first} _ordered _{so that:}

Dý ý; D2 < ... Z, DT. ₁ ( n(n-1)/2)

We position the n points randomly in ak dimensional _space

and thus we have associate J& these dissimilarities,

distances dl,... dl'r.

These distances _will _probably _{not have the same rank}

order as the dissimilarities, but _associated _with _each

dissimilarity _{we assume a monotonic} _function f (Di) _which

transforms the dissimilarities _into _distances _{di which thus}

have the same rank order _{as the dissimilarities..} The extent

to which _{the di values} _vary _from _{the di is minimized} _{in a}

normalized least squares _expression _called _stress.

Thus we mind mi ze:

FZdij

2

subject to did monotone.

This is _accomplished _{by the method of steepest} _descent..

(25)

348.

dimensionality is _selected _{as that} _{which has a low stress} _and

if the dimensionality is decreased _{by one results} in a much

higher _stress. _{, The method has been subjected} to random, data

to determine _{what is} _{a 'low'} _{or 'high'} _stress _{- see Klohr}

(1969), _Stenson _{and Knoll} (1909) _{and Wadenaar and Padmos}

(1971). _Additional _work _is _contained _in _Shepard _{and Kruakal}

(1964) _{and Shepard} (1966). _{A computer} _program _for _the

method called MDSCAL has been produced _which is currently _up

to version _{6L'CP and is available} from Ohio State University.

7. Torgerson _{and Young's} TORSCA

This _program _{is very similar} _{to MDSCAL.} _It _{is normally}

preceded by principal components to obtain _{an initial}

configuration in the desired _space. (This _{can be employed}

in LIDSCAL.. _{One may also use p. c. a. on a k; -1 dim solution} to

obtain an initial configuration in k dimensions. )

The measure which is maximized is, _with _{the previous}

notation:

2: d. d.

subject to dij monotone, _{and where} dlj is distance in

Minkowski _space.

The method yields _very _similar _results _{to the other}

computer methods. Torgerson (1965) _states "Which _program

is in fact _superior _will _{depend upon the particular}

(26)

349.

The program is outlined in Young (1968,1970) _{and Young}

and Torgerson (1967) _although incorrect _objective functions

are given. The method has been used extensively by Green

and co-workers (Green, Llaheshwari _{and Rao 1969, Green,}

Carmone and Fox 1969, Green and tIahe shwari 1909, Green, Wind

and Jain 1972). The program is available from Dr. Young at

the University _{of North} _Carolina.

8. _Guttman _{and Lingoes'} _Smallest _{Space Analysis}

This _method (Lingoes _1965,1970,19 _Guttman _8)minimizes

the function:

Z(alý

-

äiß}2

2Zd1

j2

which _{is related} to Kruskal's _stress _but did _{is defined}

differently. _Here di

j is a permutation of the di j to

maintain the rank order _{of the Di..}

The program {SSA IV) is available from Professor Lingoes

at the University of Michigan.

This _{method has been compared with} _{the previous} _{two by}

Green and Rao (1971) _{and show that} _all three _{can reproduce}

known configurations _accurately.

9. The Elastic _Distance _Method _{of McGee}

This _method (McGee 1955,1956,1968) _is _somewhat

different from _{the mainstream} _{of nonmatric} _scaling _methods

(27)

350.

The method, using _{an analogy} _{of elastic} springs connecting

unit masses in space, attempts to minimize the expression:

w=c2

d--

-ß --ý

2

i<j ii

which is analogous to the su ned moduli of elasticity between

the points. It allows longer 'springs' to be compressed by

a larger percentage than the smaller ones. McGee introduces

this into _{the model because he believes} larger distances _are

more likely to be inaccurate. Gregson and Russell` (1967)

report _experiments here _middle-range distances _were more

subject to error. McGee (1907) replies _{on a theoretical}

level, _showing his _assumption is _{a direct} _outcome _{of 7eber's}

law (a relatively old psychological law which relates the

variance in response to the physical magnitude _{of the}

stimuli _{- see Torgerson} 1958 and Guilford 1954).

10. Carroll _{and Chant's} IIyDSCAL and TADP EF Models

If _{vie have a set} _{of responses} from _{a group} _of

individuals _{and we aggregate} _the _data, _then _{Nye are} _over-

simplifying, since _each individual _{may. be judging} _on

different _properties, _{or properties} _{may be of} _differing

importance to him. _Thus _if _{we had} _{a method} _of _analysis

which enabled us. to identify differences in _subjects

perspective we could _gain information.

This _concept _{Evas used in Tucker} _{and Messick's} (1963)

'points _{of view} _analysis', _although _from _{the scaling} _of

individuals (called _the _subject _space), _they _only _found

(28)

351.

then _each _of these _{was scaled} _{independently.} _This

procedure, especially the final independent _scaling has _come

under some criticism (see Ross 1966) _{and a new program} _called

INDSCAL (for _INdividual _Differences _SCALing _model) _was

introduced _{by Carroll} _{and Chang} (1970) (see _also _Carroll

1971,1972, Wish _{and Carroll} _1971, Wish _1970,1971).

INDSCAL _allows _for _each _individual _{to have} _his _own

weile ting for _{each of the axes in the scaling} _{of stimuli}

(called _the _stimulus _space). _{The method is analogous} _to

factor _analysis, _{where a representation} _{of objects} _{in space}

is _obtained _{and also} _{a set of vectors} _from _{the original}

variables, _which _{are in this} _{case individuals.}

The procedure in fact _{is mathematically} _very _similar _to

factor _analysis, _using _results _obtained _{by Torgerson} (1958)

gor _{the non metric} _case.

The multidimensional _scaling _methods _{of Kruskal} _and

McGee have also been adapted to handle these _cases _of

differing _perceptions.

A related model, MIDPR F, also by Carroll _{and Ghan, (see}

Carroll _{1971, '1972),} _concerns _itself-with _differences _in

preference rather than perception.

It _{is concerned} _with _{the concept} _of _'ideal _points'.

Suppose we had the following _{configuration} _showing the

similarity _{of different} drinks _obtained _from _{the responses}

(29)

352.

-4 9

xý

. _{)( G}

x ý.

and a particular individual ranked the drinks in the order

of preference DACBE. We can assume that his 'ideal' drink

would be placed nearest' to, D in the configuration, and with A

its _next _nearest _point, _{and so on.} Thus there _exists a

point (or area) on the diagram which best represents this,

rank order of preference. Thus for each individual we can

include _{an 'ideal} _point' _{on the} diagram.

This _concept _{and a simple} _{method of finding} ideal points

is used in Bennett and Hays' (1960) unfolding method.

Carroll _{and Chang' s method proceeds} in a similar way, but

enables the rater to view dimensions in differing weights.

11. Loss Functions

Loss functions _{are very} _similar to psychological scaling

methods except that they try to preserve distance rather than

rank order, in a minimum number of dimansions.. The

dimensionality _{s is} _selected _{and in this} _{space we try} to

maintain the original distance. Thus we could minimize:

Y ID

_{- di}

Since this function is not differentiable _{at zero we may}

choose either to split the function or minimize its square:

Y, (D

- a)2

However, this _will tend _{to maintain} larger distances at

(30)

353.

use as a clustering method, it will not give an accurate

ordination unless a constant error function is assumed.

Thus we minimize:

a-

1)2

This is _solved by the method of steepest descent.

Other _similar functions _could be used for _{optimization.}

For example:

(D2

- d2)2

r(D-d ₎₂

: 7- D2

ZD(D-d)2

2: D3

ZD(D

- d)2

ýc

ýD

.D

w Z{D-a-C}2

Thompson and Woodbury (1970)

Anderson (1971a)

Anderson (1971b)

Sari. on (1969)

Cooper (1972) _{where c is} _an

additive _constant

The functions _{above can all} be generalized to non-metric

methods by the incorporation of monotonicity constraints., and

similarly non-metric methods may be converted.

The use of loss functions has not been widespread.

because _of the _current interest in _non-metric _methods. The

references above contain _{most of the few applications} of loss

(31)

354.

D. 3 _{DISCUSSION AND COMPARISON OF METHODS}

Ordination _methods have only _recently _{come into}

prominence as multivariate methods, _{and are at a lower} stage

of development than cluster _analysis. In clustering, most

methods have a common aim, which is fairly clear, Whereas

ordination results require _greater investigation by the user,

and methods have more diverse objectives. It _.has thus been

easier to perform comparative studies on cluster methods, and

more such investigations have been published. The

clustering process is also one which is more easily

visualized and so the methods have been more widely _applied.

Green and Carmone (1970) have summed up the current state of

non-metric scaling in the following _words:

... algorithm development has out-stripped application

and evaluation techniques... testing _{and evaluation}

will be necessary over the next several _years if

these _procedures _{are to receive} _{the extent} _of

application that they currently appear to warrant. "

lJost _{of the investigations} _which _{have been carried} _out

have been into the properties _{of'non-metric} _stress

minimization _{-= in} ' particular, _{some of the works} of Green,

Car+none, Kruskal _{and Shepard.} _Also,,

. the comparative studies

that _{have been made have usually} _{been between} _non-metric

methods. Because of the computer _programs _{which. exist} for

these _methods, _{and the lack} _{of information} _{on other} _methods,

there _{has been. a preoccupation} _{of ordination} _users _with these,.

to the exclusion _{of metric} _models. Loss functions _{are a}

(32)

355.

simplicity relative to non-metric _{multidimensional} _scaling),

and studies involving these are isolated. It is a difficulty

both _{in clustering} _{and here,} _that _{because ' of the demand for}

computer _packages, that methods Which are widely _available in

this form _{are used much more than} _others _which _{are perhaps}

more suitable. This _{use of packages} _{as black} boxes often

leads _{to an inappropriate} _{method being} _used, _{and hence}

misleading results.

his _section, because _of the _{more primitive} _state _of

ordination as a technique, is designed to fill in _{some of the}

current _{gaps which} _exist _{- in particular} in metric _models.

We have restricted ourselves to a more verbal than

experimental comparison. This is partially due to the

nature of the ordination methods we have outlined _{- they have-}

been designed for _slightly different _purposes, _{and the choice}

of technique is to some extent dependent _{on the input} data

available and the type _{of output} data _required. 'In this

section therefore _{we will} _attempt to. arrive at decision rules

v rich will- simplify the choice of technique in any particular

application. We will introduce ordination as an approach to

data analysis and not _{as a cook-book} _{of methods.} The sort

of items _.ich we will discuss _will be such questions as the

type _{of method} _required _{by types} _{of data,} _{and for} _specific

purposes, and also the properties _{and shortcomings} of the

methods. As we consider this _study _{as a somewhat pioneering}

work, we shall content ourselves _with outlining _properties of

the methods as a helpful starting _point for further research

(33)

356.

Our discussion _will _{be in three} _general _areas. _Firstly

a discussion _{of the existing} _non-metric _{method comparisons} _and

evaluations, _{and the questions} _which they bring _up. The

second section will be a consideration _{of the metric} _models,

and will be more experimentally based. Most of the metric

versus _non-metric discussion _will be left _until the third.

section, _{Where more general} topics _will be discussed.

Non--Metric _Liethods

`When the Shepard-Kruska1 _{method was first} _introduced _the

main question was whether the method could actually _give _a

correct _{configuration} _{of data} that had been monotonically

transformed. _{In the second of Shepard's} _papers _introducing

multidimensional scaling (1962b) he gave several. _examples.

Firstly _{he created} _{a similarity} _measure _{si j=} _{e_1'} 4dij _from

the Euclidean distances di

j between 15 points in 2 dimensions,

and his method showed almost _perfect _recovery _{of the original}

configuration. He repeated the experiment _with _an

exponential decay function _{and an arbitrary} _monotonic

function _{of line} _segments, _{and also} _{a one-dimensional} _{and a}

three-dimensional data _set, _all _{gave very} _{good recoveries.}

Shepard (1966) _analysed _{a wider} _range _{of data} _sets, _using

two-dimensional _{configurations} _{and found} _{the minimum}

correlation between the original _{and recovered} data in several

runs with different data with 3-45 points. The minimum

correlation _{was over} _.99 for _{cases with} _over _{10 points,} but

dropped below

. 90 for less than 6 points.

Young (1970) _obtained _results _with _{TORSCA on Shepard's}

data _{and compared} _{them with} _Shepard's _results. _{The most}

(34)

357.

"If _{one relies} _heavily _{on the stress} _index, _the

unfortunate situation _exists that _{as he diligently}

gathers _{more and more data about} _{an increasingly}

larger _{number of stimuli,} _{he will} _{become less} _and

less _confident _{in the nonmetrically} _{reconstructed}

configuration, _{even though} it _{is more accurately}

describing _{the structure} _underlying _{the data. "}

Shepard (1962b) _also _considered _the _recovery

of

dimensionality

- he used the method of plotting dimension

against _stress, _{and looking} for _{a sudden increase} _{in the}

stress _{at a certain} dimension. _{We shall} _call _this _{the elbow}

method. Considering that the variance _{of the original} _points

was approximately the _{same in} _each dimension, _the _curve, _was

not _{as 'elbowed'} _{as one might} _expect _{- the} _curve _in _the _two-

dimensional _case _{was very} _smooth. _These

results. _{may be due}

to the _actual _stress _measure _used, _which

was later replaced

by Kruskal's _stress. _In _the _{same paper,} _Shepard

gives

several _results from _real _experiments, _{and it} _{is interesting}

to note that the transformations _from _distance _{to similarity*}

were all simple functions _{- linear,} _quadratic, _{or exponential.}

Abelson _{and Tukey} (1963) _{have showmi that,} _given _{4 points}

in one dimension then if _{we know the complete} _ordering _of

distances, then _{the minimum r2 between the real points} _and

any others with the same ordering, is between

. 91 and . 97

depending _{on the actual} _order. _{They also showed that} _this

decreased _{as the number of points} _increased, _{Which is}

contrary to expectation. Shepard _{and Carroll} (1966) _suggest

that this is _{due to the measure used, and that perhaps the}

(35)

358.

Kruskal (1964a) _{used one of the data sets} _analysed by

Shepard _{and attempted} to recover the. configuration _after the

data had been transformed _{and a random normal} _error function

added. He achieved almost perfect results. He also showed

stress vs. dimension graphs for three sets of random data in

6 dimensions _{of 20,15} _{and 10 points,} _{and the results} _give

two of the data sets 'fair' fits in 3 dimensions and the

smallest set gave an 'excellent' fit. Kruskal also

suggested 3 dimensions for the sets, but failed to point out

that, _{as the original} data was 6 dimensional, then _{one would}

have expected to be able, to recover this dimensionality.

Another _point from the real data results that Kruskal _gives

is that _{a lot} _{of the stress} _vs. dimension _graphs _appear to be

very smooth curves.

Vlagenaar and Padrnos (1973), used Monte Carlo methods to

examine how stress varied with dimensions. They used

configurations of 8 and 10 points, with equal variance in

each dimension, and added various levels of random normal

error. Their main conclusions were:

1. Too low a dimensionality does not always cau: z high

stress _{- for} example with three-dimensional data and 8

points, the two-dimensional solutions gave stress of

4-8% for the lower levels _{of added error} (i. e. 'good'

fits _according to Kruskal's interpretation. _{of stress}

levels) _{and with} _{10 points} the levels _{were 7-12}

('fair').

2. The elbow effect is _only found _with the lowest _error

(36)

359.

Both points are especially important considering that the

data was normalized; if _{one had,} _{in a real} _case, dimensions

of varying weights, one would expect even better fits in

lower dimensions, _{and less} _elbow _effect.

There _{are several} _special _cases _{of data which} _{can give}

erroneous results under non-metric methods. These are cases

where the configuration is under-constrained, and the use of

rangy. order is an oversimplification of the real data. For

example, any three points in a two-dimensional space, that

form _{a triangle} _with _{a unique} longest _side, _{can be scaled} by

non-metric methods into one dimension _with zero stress. If

we consider the case where the points form an alrmost"

equilateral triangle, slight error could transform the three

points into a completely wrong one-dimensional solution. with

zero stress. This example _{can be extended} to that of a

three-dimensional tetrahedron _{in which the three} lines

forming _{any one face} _{are all} larger than _{the other} three

lines. This _will _also _give _{a 'perfect'} _{one-dimensional}

scaling, and this can be further extended to an n-dimensional

simplex.

Another instance _{where misleading} _results _{can ba-}

obtained is where clusters are present in the data, such that

all between cluster distances are larger than all within

cluster distances. In this _{case the input} data in rank

order form has insufficient information to be able to

reconstruct the correct inter-cluster distance, and so the

clusters can be a random distance _apart in the final

(37)

360.

Shepard (1962b) _gives _{a further} _instance _{of possible}

distortion _{- that} _{of a smooth arc spanning} less than _1800,

which _will _{become a 'perfect'} _{one-dimensional} _scaling.

(This _{may be generalized} _to _several _smooth _higher _dimensioned

surves, including some hyperhelices, _{and also} bowl-shaped

configurations can be mapped in 2-space without Stress. )

Shepard _states:

".:. the likelihood that _{a sufficiently} _large _{set of}

points would fall _{on the very} _special kind _{of curved}

subspace that will lead to a spuriously flattened

solution is probably _quite _small... Certainly _no

such problem seems to have risen in the various

applications to real and artificial data presented

in the preceding section. "

Iwo pages before this _statement _Shepard _gives _{an example} _of

the multi-dimensional scaling _{of people's} _perception _of

colours in which he obtains the configuration below:

0 "

VI CST

C

ýt-

Here _{the special} type _{of curve} _{has very nearly} _occurred, _and

with the exclusion _{of five} _points, _{would have done.} There. is

also the possibility that the curve might in reality be

(38)

361.

The recovery _{of metric} _{configurations} _{has been 'shown to}

be almost perfect _vAth _{the non-metric} _methods for _{over four}

points in _{one dimension} _{and over} _eight _points _{in two}

dimensions. _{Good results} _{have also been obtained}

with

moderate _error _{added to the distance} _matrix. _Certainly _rani:

order information implies _{a considerable} _amount _{of interval}

information. _{A point} _worth _mentioning _is _that

some of the

examples _cited begin _with the _original _{configuration} _{as a}

starting _point, _{and this} _reduces _the _chance _{of finding} _a

local _minimum, _{and in} _fact, _if _{one is} _found _then _it

may ba

closer to the _original _{configuration} _than _the _global _minimum.

The recovery of dimensionality is _{a more difficult}

problem. Most of the graphs of stress that _{have been}

published _{are too smooth for} _reliable _selection _{of the}

correct _{number of dimensions,} _{and these} _{have often} _{been with}

normalized data, _which _{one would} think _{would be the most}

difficult to flatten into _{too few dimensions.} _It _is

certainly true, _{as Torperson} (1955) _states, that _monotonic

distortion _gives _rise _{to an increase} _{in dimensionality,} _but

the distortion is _difficult _{to separate} _from _{the real} _values.

We agree with Shepard that _{in real} _examples _{a zero}

stress _{configuration} in a too lo', rz number of dimensions _will

be unlikely, but _consider _that _there _is _{a fair} _chance _of

cases in which false flattening _occurs _{and gives} _{a low enough}

stress in the lower dimension _for this _solution _{to be}

accepted. In these _{cases however} it _might _{be possible} _to

determine if _false _flattening _{has occurred} _{by consideration}