Relations and Relational Systems
2.2. Analyzing One Proximity Relation
The situation in which we have one proximity relation between the elements of one set of objects consti-tutes the classic multidimensional scaling setup, which is well documented in the literature (e.g., Everitt &
Rabe-Hesketh, 1997; Kruskal & Wish, 1978). After a summary of the general MDS setup, we pay special attention to the analysis of asymmetric data and conclude this section with a related discussion of unfolding, analyzing one proximity relation between the elements of two sets of objects.
2.2.1. General MDS Setup
In brief, the objective of MDS is to find a config-uration of n points{xi, i = 1, . . . , n}, where xi has coordinates{xiu, u = 1, . . . , p}specifying its loca-tion in a p-dimensional spatial model. Typically, the configuration is two-dimensional (p = 2), but this choice can only be justified, of course, by a reason-ably good fit. The quality of the fit is assessed by determining distances d(xi, xj) between all pairs of points (most common is the ordinary Euclidean dis-tance). These inter-point distances should reflect the inter-object proximities: If two objects are relatively similar in the data, their corresponding points in the model must be close together, but if two objects are relatively dissimilar, their corresponding points must be far apart. Goodness of fit of the configuration is measured (quite indirectly) by the quality of the fit of the nonlinear regression equation
ϕ[δ(ai, aj)] = d(xi, xj) + εij. (1) Here, δ(ai, aj) denotes the given dissimilarity value for objects aiand aj; ϕ[.] denotes a transformation that maps the dissimilarity values into a set of transformed values ˆd(ai, aj), called pseudo-distances or d-hats;
and εijare the residuals. In general, ϕ[.] will be some selected type of function, reflecting the kind of infor-mation in δ(ai, aj) that we want to take into account in the analysis. For example, ϕ[.] could be a linear function with positive slope and either with or without an intercept, or it could be a step function that assigns new, identically ordered values to the given δ(ai, aj), so that only the rank-order information is preserved.
When the relation between the objects is given in terms of a similarity function ρ(ai, aj), the transformation ϕ[.] is required to be linear with negative slope, or monotonically decreasing.
2.2.1.1. Measures of Fit and Distributional Assumptions
In any case, according to (1), the pseudo-distances ˆd(ai, aj) = ˆϕ[δ(ai, aj)] follow from the optimal transformation, that is, the transformation that opti-mizes the fit for given xi and xj, and so they are approximately equal to the d(xi, xj) in some definite sense. Measuring quality of fit of an MDS solution by a least squares criterion was an idea introduced by Kruskal (1964), who actually used the root mean square error, which he called Stress (the reverse of fit). Mainly for computational convenience, Takane, Young, and de Leeuw (1977) squared the distances in (1), calling the resulting badness-of-fit function S-Stress. Noting that proximities are always nonnega-tive, Ramsay (1977, 1978, 1980, 1982) introduced an alternative regression equation based on the idea that the distances are not disturbed by additive random fac-tors εij, but by multiplicative, positive random factors υij, which are asymmetrically distributed around 1, in such a way that log υij is normally distributed around zero. In a similar vein, Takane (1981, 1982), Takane and Carroll (1981), Takane and Sergent (1983), and Sergent and Takane (1987) suggested and tested several multidimensional scaling models based on a variety of distributional assumptions for specific data collection processes.
Here, we stay in the least squares framework, which provides maximum likelihood estimates under the assumption of normal errors. Reiterating the strong points, the least squares method is flexible, weights can be used to adjust for nonstandard error structures, and it is known to perform well under a broad range of circumstances. In fact, Storms (1995) has shown in a Monte Carlo study that violations of the assumed error distribution have virtually no effect on the esti-mated parameters. Spence and Lewandowsky (1989) and Heiser (1988a) studied robust methods for MDS but reached the conclusion that under moderate levels of error, standard MDS methods are not particularly vulnerable to outliers, especially when robust initial configurations are used.
2.2.1.2. Probabilistic Models
It is also possible to make distributional assumptions on the model side of the equation—that is, on xi and xj in (1)—from which a different class of methods has arisen (Ennis, Palen, & Mullen, 1988; MacKay, 1989, 1995, 2001; Mullen & Ennis, 1991; Zinnes &
Griggs, 1974; Zinnes & MacKay, 1983; Zinnes &
Wolff, 1977). These probabilistic models provide a
rather different mechanism of random variation, with the counterintuitive property that the expected value of a dissimilarity judgment over replications can be far off the model value of the corresponding distance (even to the extent that there is no monotonic relationship between expected dissimilarity and distance). For this reason, Monte Carlo studies using random perturba-tions of the point locaperturba-tions xi to study the behavior of standard MDS methods, such as Young (1970), Sherman (1972), Spence (1972), and Spence and Domoney (1974), have questionable validity. The same remark applies to the studies by Girard and Cliff (1976), MacCallum and Cornelius (1977), and MacCallum (1979), who used a data generation mech-anism with biases in the small and large distances.
2.2.1.3. The Problem of Asymmetry
A key assumption in the classic multidimensional scaling setup is symmetry of the proximity relation—
that is, δ(ai, aj) = δ(aj, ai)—in accordance with the symmetry of the distance function used in the geomet-ric model. Yet, quite often, relational data in their raw form are not symmetric. For instance, in stimulus iden-tification experiments, confusion errors are counted, and it is not unusual to observe rather big asymme-tries between the count of responding with aj when stimulus ai is presented and the count of responding with ai after presentation of aj (cf. Heiser, 1988b).
These effects may be due to stimulus familiarity or a response bias or to similar processes in other contexts.
They can be removed prior to analysis or explicitly incorporated in a model. For comprehensive reviews of the treatment of asymmetry, the reader is referred to Everitt and Rabe-Hesketh (1997, chap. 6) and Zielman and Heiser (1996).
Here, attention will be restricted to two strategies to analyze asymmetric relational data. The first splits the relation into two parts and finds two representations of a single set of objects; the second considers the row and column elements of the data matrix as two different kinds of entities and finds a single representation of two sets of objects. Taking frequencies as our leading case of data collection, we denote the raw observations by fij, with i = 1, . . . , n and j = 1, . . . , n, and we assume fij> 0 for all i, j.
2.2.2. Making Two Representations of a Single Set of Objects
As a preliminary to any consideration of genuine asymmetry, it is often useful to remove the main effects from the data, which reflect the tendency of some
objects having consistently higher frequencies than others, because they are more prominent, more popu-lar, or otherwise more bulky. A simple correction for such main effects is to equalize all self-similarities by the standardization
sij= fij
fiifjj
, (2)
which ensures that sii = 1 for all i. The rationale of using this standardization is that, if the simple model fij = αiαjθij holds, with αi some object-specific main effect parameter and θijan interaction parameter with equal diagonal elements (θii = 1), then these assumptions would give sij= θijin (2). Note that this standardization does not affect the asymmetry in fij
except for scale; that is, the odds across the diagonal remain the same: sij/sji= fij/fji.
2.2.2.1. Multiplicative Decomposition
Now consider the multiplicative decomposition of sij into a symmetric factor and an antisymmetric factor,
sij= r(ai, aj) t (ai, aj), (3) where the two constituting factors are defined as
r(ai, aj) = √sijsji, (4a) t (ai, aj) = sij
sji. (4b)
In these definitions, the objects are again identified explicitly by ai and aj, for reasons that will become apparent shortly. It is easily verified by substitution of r(ai, aj) and t (ai, aj) that equation (3) is always true, so that the decomposition can always be made without any further conditions. It is also clear from (4a) that the first factor is symmetric, r(ai, aj) = r(aj, ai), and that it equals the geometric mean of the elements above and below the diagonal of the matrix S= {sij}, whereas (4b) shows that the second factor is antisymmetric: Two corresponding elements across the diagonal have a perfectly inverse relationship, t (ai, aj) = 1/t(aj, ai).
2.2.2.1.1. Shepard’s universal law of general-ization. Combining (4a) with (2), we obtain the symmetric similarity measure
r(ai, aj) =
fijfji
fiifjj
, (5)
an expression first developed by Shepard (1957) for stimulus and response generalization processes. In this paper, he also gave the rationale for linking similarity to distance by the rule
r(ai, aj) = e−d(xi,xj). (6) If (6) is correct, then it follows that a nonmetric MDS of r(ai, aj), based on (1), should yield the transforma-tion ϕ [.]= − log. Evidence in more than 10 studies (Shepard, 1987; also see Nosofsky, 1992), involving both human and animal subjects and both visual and auditory stimuli, has confirmed this hypothesis, and hence the exponential decay function (6) has been named the universal law of generalization.
2.2.2.1.2. Luce’s choice model. Combining (4b) with (2), we find
t (ai, aj) =
fij
fji, (7)
which can be interpreted as the root odds of responding with ajif aiis presented against the reverse; t (ai, aj) is a natural measure of the dominance relation between ai
and aj. The simplest model for a dominance relation is the Bradley-Terry-Luce (BTL) model, a theory of choice developed by Bradley and Terry (1952), which was extended and given an axiomatic basis by Luce (1959). It states that the probability of ai dominating aj depends only on the two nonnegative parameters associated with each object, αiand αj, and not on any other parameter:
pij= αi
αi+ αj. (8)
From (8), it follows that pij+ pji = 1 and that the root odds defined in (7) under this model are
αi αj, simply the root of the ratio of the two parameters. Sum-marizing the development so far, we can decompose any asymmetric set of similarities{sij}into a symmet-ric component{r(ai, aj)}, on which we can do some form of multidimensional scaling, and an antisymmet-ric component{t (ai, aj)}, on which we can fit the BTL model, or some similar model, for paired-comparison data.
2.2.2.2. Additive Decomposition
Up to this point, all operations have been multipli-cations and divisions. However, it is often desirable when working with frequencies to use a log scale, as is done in log-linear analysis (Wickens, 1989). An
additive version of the basic decomposition (3) is obtained by taking the logarithm of both sides of the equation, yielding
µij= ρ(ai, aj) + τ(ai, aj), (9) where µij = log sij, ρ(ai, aj) = log r(ai, aj), and τ (ai, aj) = log t(ai, aj). The equivalents of (4a) and (4b) are
ρ(ai, aj) = 1
2[µij+ µji], (10a) τ (ai, aj) = 1
2[µij− µji]. (10b) In general, any matrix can be additively decomposed as in (9), that is, into the sum of a symmetric component (10a) and a skew-symmetric component (10b). Instead of a geometric mean (4a), we now have an arithmetic mean (10a), and instead of the antisymmetry property t (ai, aj) = 1/t(ai, aj), we now have the skew-symmetry property τ (ai, aj) = −τ(ai, aj). Additive decomposition of asymmetric matrices is well known through the work of Gower (1977), although the idea seems to be much older: Halmos (1958, p. 136) refers to it as the Cartesian decomposition. As pointed out by Gower, the components ρ(ai, aj) and τ (ai, aj) are uncorrelated, so that we can analyze them separately by least squares.
2.2.2.3. Application: Citation Frequencies Among Psychological Journals
To illustrate this approach to asymmetry, we now reanalyze some data collected by Weeks and Bentler (1982) on citation patterns among 12 psychologi-cal journals. The raw frequencies are reproduced in Table 2.1, together with the list of journals used. An entry in Table 2.1 indicates the number of times that a paper in the row journal cites some paper in the column journal. It is clear that the Journal of Personality and Social Psychology (JPSP) generates by far the largest number of citations, including many self-citations, whereas the American Journal of Psychology (AJP) and Multivariate Behavioral Research (MBR) have a rather low number of citations (primarily due to the smaller number of articles per year), with AJP citing the Journal of Experimental Psychology (JEP) more frequently than itself and MBR citing Psychometrika (PKA) more frequently than itself. To avoid problems with zero frequencies, we added 0.5 to all entries of the table. Then sijwas calculated according to (2); the symmetric similarities ρ(ai, aj) according to (10a), in which the minimal value was added to make all
Table 2.1 Journal Citation Data
AJP JABN JPSP JAPP JCPP JEDP JCCP JEP PKA PB PR MBR
AJP 31 10 10 1 36 4 1 119 2 14 36 0
JABN 7 235 55 0 13 4 65 25 3 50 31 0
JPSP 16 54 969 28 15 21 89 62 16 149 141 16
JAPP 3 2 30 310 0 8 5 7 6 71 14 0
JCPP 4 0 2 0 386 0 2 13 1 22 35 1
JEDP 1 7 61 10 2 100 6 5 4 18 9 2
JCCP 0 105 55 7 3 10 331 3 19 89 22 8
JEP 9 20 16 0 32 6 1 120 2 18 46 0
PKA 2 0 0 0 0 6 0 6 152 31 7 10
PB 23 46 124 117 138 7 86 84 62 186 90 7
PR 9 2 21 6 3 0 0 51 30 32 104 2
MBR 0 7 14 4 0 0 24 3 95 46 2 56
SOURCE: Weeks and Bentler (1982).
NOTE: Rows represent journals giving citations; columns represent journals receiving citations. Data collected in 1979. Journals and their abbreviations:
AJP= American Journal of Psychology; JABN = Journal of Abnormal Psychology; JPSP = Journal of Personality and Social Psychology; JAPP = Journal of Applied Psychology; JCPP= Journal of Comparative and Physiological Psychology; JEDP = Journal of Educational Psychology (numbers 1–3 only); JCCP= Journal of Consulting and Clinical Psychology; JEP = Journal of Experimental Psychology (General); PKA = Psychometrika;
PB= Psychological Bulletin; PR = Psychological Review; MBR = Multivariate Behavioral Research.
Table 2.2 Journal Citation Data: Decomposition in Symmetric and Skew-Symmetric Parts
AJP JABN JPSP JAPP JCPP JEDP JCCP JEP PKA PB PR MBR
AJP 0 4.37 4.06 2.88 4.49 3.57 1.87 6.04 3.32 5.22 5.52 2.21
JABN 0.17 0 4.48 1.16 1.89 3.37 5.43 4.65 1.68 5.18 3.77 2.56
JPSP −0.23 0.01 0 3.72 2.06 4.49 4.56 4.28 1.75 5.51 4.89 3.93
JAPP −0.42 −0.80 −0.03 0 0.10 3.72 2.73 2.04 1.85 5.68 3.72 2.16
JCPP 1.05 1.65 0.91 0 0 1.47 1.85 4.31 1.01 5.07 3.75 1.51
JEDP 0.55 −0.26 −0.53 −0.11 −0.80 0 3.55 3.73 3.51 4.19 2.79 2.43
JCCP 0.55 −0.24 0.24 −0.16 −0.17 −0.24 0 2.18 2.37 5.61 2.63 4.40
JEP 1.27 0.11 0.67 1.35 −0.44 −0.08 0.42 0 3.13 5.31 5.82 2.51
PKA 0 0.97 1.75 1.28 0.55 −0.18 1.83 −0.48 0 5.31 4.52 5.57
PB −0.24 0.04 0.09 −0.25 −0.91 0.45 0.02 −0.76 −0.34 0 5.70 4.94
PR 0.67 1.27 0.94 0.40 1.16 1.47 1.90 −0.05 −0.70 0.51 0 3.22
MBR 0 −1.35 0.06 −1.10 0.55 0.80 −0.53 −0.97 −1.10 −0.91 0 0
ˆβi 0.28 0.10 0.36 0.22 −0.31 0.28 0.30 −0.46 −0.66 0.12 −0.63 0.38
NOTE: Upper triangular part contains symmetric similarities; lower triangular part contains skew-symmetric dominances. Journals and their abbrevi-ations: AJP= American Journal of Psychology; JABN = Journal of Abnormal Psychology; JPSP = Journal of Personality and Social Psychology;
JAPP= Journal of Applied Psychology; JCPP = Journal of Comparative and Physiological Psychology; JEDP = Journal of Educational Psychology;
JCCP= Journal of Consulting and Clinical Psychology; JEP = Journal of Experimental Psychology (General); PKA = Psychometrika; PB = Psychological Bulletin; PR= Psychological Review; MBR = Multivariate Behavioral Research.
quantities nonnegative; and the skew-symmetric dominance data τ (ai, aj) according to (10b). The results are given in Table 2.2 above the diagonal and below the diagonal, respectively.
2.2.2.3.1. MDS analysis of the symmetric part.
The symmetric similarities in the upper-triangular section of Table 2.2 were then input to the MDS program PROXSCAL1, with the ordinal
1. PROXSCAL is distributed by SPSS, Inc., 233 S. Wacker Drive, 11th Floor, Chicago, IL 60606–6307 (www.spss.com), as part of the Categories package.
transformation option chosen, and initialized with the classic Torgerson solution (Torgerson, 1958) on the quantities ρmax− ρ(ai, aj), where ρmax is the maxi-mal similarity value. The two-dimensional solution is shown in Figure 2.1 (as we have 12 (12− 1)/2 = 66 independent data values, we restrict attention here to p = 2, which requires 2 (12 − 1) − 1 = 21 free parameters to be estimated). The fit of the solu-tion in terms of Kruskal’s Stress-1 is 0.192, which is
“fair” according to Kruskal’s (1964) qualifications. In terms of the percentage of dispersion accounted for (%DAF)—which is defined as 100 times the sum of squared distances, divided by the sum of squared
Figure 2.1 Two-Dimensional Ordinal MDS Solution for the Symmetric Part of the Journal Citation Data
JABN
JCCP
JEDP
JPSP
JAPP
PB
JEP AJP
PR MBR
PKA JCPP
NOTE: Journals and their abbreviations: AJP= American Journal of Psychology; JABN = Journal of Abnormal Psychology;
JPSP= Journal of Personality and Social Psychology; JAPP = Journal of Applied Psychology; JCPP = Journal of Comparative and Physiological Psychology; JEDP= Journal of Educational Psychology; JCCP = Journal of Consulting and Clinical Psychology;
JEP= Journal of Experimental Psychology (General); PKA = Psychometrika; PB = Psychological Bulletin; PR = Psychological Review; MBR= Multivariate Behavioral Research.
pseudo-distances2 (Heiser & Groenen, 1997), and which is comparable to percentage of variance accounted for, except that the mean is not taken out—the fit is 96.3%, which is quite satisfactory.
To give a visual impression of the fit, we provide a regression plot in Figure 2.2a of the fitted distances against the transformed proximities, which are in turn plotted against the original similarities ρ(ai, aj) in Figure 2.2b, in a so-called transformation plot. What Figure 2.2b shows is that the monotonically decreasing values of the transformed proximities (which preserve the order of the original proximities) are rather close to a linear transformation of ρ(ai, aj) = log r(ai, aj), with negative slope. The implication is that (6) is cor-rect, a confirmation of Shepard’s law. The location of
2. Dispersion accounted for is equal to 1 minus the quantity actually minimized in PROXSCAL.
the journals in Figure 2.1 is close to the result obtained by Weeks and Bentler (1982) with their specific model.
It shows Psychological Bulletin (PB) in the center and, going counterclockwise, a clinical-social-educational cluster at the top, a physiological-cognitive cluster in the lower left corner, a quantitative-methodological cluster in the lower right corner, and finally the Journal of Applied Psychology (JAPP), which com-municates least with the Journal of Comparative and Physiological Psychology (JCPP).
2.2.2.3.2. BTL analysis of the skew-symmetric part. As originally pointed out by Fienberg and Larntz (1976), the maximum likelihood estimates of the BTL parameters in their log form (βi = log αi) can be obtained by a standard log-linear analysis program (cf. Wickens, 1989, pp. 255–257). Simple least squares estimates of these β-parameters can
Figure 2.2 Scatter Plots of Journal Citation Data
transformed proximities
0
0 1 2
raw proximities (b)
3 4 5 6
0.5 1 1.5 2
0
0 0.5
transformed proximities
distances
(a)
1 1.5 2
0.5 1 1.5 2
NOTE: Panel (a) shows the regression plot of fitted distances against transformed proximities, and panel (b) shows the transformation plot of transformed proximities against logged input frequencies.
be more easily obtained by just taking the column averages of a matrix that has the same lower triangular elements as Table 2.2, denoted by τ (ai, aj), and upper triangular elements defined as τ (aj, ai) = −τ(ai, aj);
such column averages are given in the last row of Table 2.2. The values of the estimated BTL scale values range from −0.66 to 0.38, which is a rather small range (they can be compared to z-values), indi-cating that the amount of asymmetry is modest. In fact, the relative amounts of symmetry and skew-symmetry in the table can be expressed quantitatively because the fact that ρ(ai, aj) and τ (ai, aj) are uncor-related implies that from (9), we can derive an addi-tive decomposition of the sum of squares of the µij
values:
SSQ[µij]= SSQ[ρ(ai, aj)] + SSQ[τ(ai, aj)]. (11) In the present example, we obtain SSQ[µij] = 1030.22, SSQ[ρ(ai, aj)] = 988.88, and SSQ [τ (ai, aj)] = 41.34, from which the relative con-tributions of the symmetric and the skew-symmetric component are 96% and 4%, respectively. As the last line in Table 2.2 shows, PKA, PR, JEP, and JCPP are journals that tend to be cited, whereas MBR, JPSP, and the Journal of Consulting and Clinical Psychol-ogy (JCCP) tend to cite others more than others cite them.
2.2.3. Unfolding: Analyzing the Proximity Relation Between Two Sets of Objects
In the example of citation counts between journals, we might also consider the row elements as being different from the column elements because they have different roles: row journals are citing, whereas col-umn journals are being cited. More generally, we might consider the proximity relation as being one between a set of row objects{ai, i = 1, . . . , n}and a set of column objects{bj, j = 1, . . . , m}, to be represented as a set of row points{xi, i = 1, . . . , n}and a set of column points{yj, j = 1, . . . , m}, respectively, with xihaving coordinates{xiu}, as before, and yj having coordinates{yju}.
2.2.3.1. General Definition of Unfolding In the general unfolding situation, we do not neces-sarily have n= m, as is the case in the current citation example, and we might even have completely different types of objects in rows and columns. Most typically for unfolding, the set{ai}usually refers to persons, the set{bj}usually refers to attitude items or stimuli, and the proximity relation expresses the strength with which a particular person aiendorses a particular item bj, or the relative amount of time or money aiwould be willing to spend on bj. In the spatial representation
sought, we determine the Euclidean distance between xiand yjby the formula
d(xi, yj) =
u(xiu− yju)2. (12) A related model for analyzing individual differences in rankings or ratings that is often subsumed under the unfolding concept (Carroll, 1972; Nishisato, 1994, 1996) is the so-called vector model, indepen-dently conceived by Tucker (1960) and Slater (1960).
Because this chapter is restricted to distance models, whereas the Tucker-Slater model uses inner products between vectors to represent the data, the reader is referred to Heiser and de Leeuw (1981) for a detailed
Because this chapter is restricted to distance models, whereas the Tucker-Slater model uses inner products between vectors to represent the data, the reader is referred to Heiser and de Leeuw (1981) for a detailed