A second pass of CA using an Optimum Number of Clusters

The initial pass of CA determined that 2 factors and 3 clusters would best suit a second pass. The matrix header was changed to reflect those values, and CA re-run. That produced clusters that a human might well discern. Two further automated processes were applied to the ensuing clusters. The first involved mapping clusters of prepositions to clusters of meta-types, to show correspondence between the two sets.

That problem is overcome in a similar way as taken for outliers, which involved averaging and sorting results to reveal corresponding clusters. The second task faced here is in identifying what meta-types are associated with particular axes. First, though, to an experiment in mapping between clusters of prepositions and clusters of meta-types.

7.6. CA ON UNAMBIGUOUS TRIGRAMS: PASS 1

Results VII of mapping between row and column clusters

Table 7.13 presents mappings between the three clusters from pass two. Columns Cl_R and Cl_Cgive the indices of clusters in respective lists. Prepositions and meta-types were taken from labels of points in those clusters, and results sorted on the mean F1 for row clusters:

ClR F1 Prepositions ClC F1 Meta-Types

0 -538 of 1 -410 tt, mt, tm

2 -266 by, on, in, with, for, about, as, like 0 -81 at, mm, am

1 852 to 2 939 ta, ma, aa

Table 7.13: Mappings between 3 clusters from pass 2 of CA.

The cluster at index 0 for the preposition ‘of’ to the left side of Table 7.13 aligned with meta-type cluster 1 from the right-hand side that comprised meta-types ‘tt’, ‘mt’, ‘tm’. Of those. Remaining preposition clusters 1 and 2 respectively mapped to meta-type clusters 2 and 0. Note that F1 values between corresponding preposition and meta-type clusters, though different, are of similar magnitude.

Take, for example, the meta-type F1 value of -410 from the first row of Table 7.13; that value was closer to of -538 for the preposition value from index 0 than it was to -266 from index 1.

Discussion VII of mapping between clusters

Abandoning factor F3 due to a relatively small amount of overall inertia meant that little remained to be explained after assessing F1 and F2. In other words, the graph for pass 1 from Figure 7.4 was very nearly a complete depiction of interactions between prepositions and meta-types. Subsequently, mappings arising from that CA reflected the positions of corresponding clusters in sets I and J, and allow machines to emulate humans in interpreting CA graphs. In fact, success in using mean F1 from results for prepositions was due partly to few clusters arising. Should more clusters emerge, taking F1 and F2 together might offer a more discerning approach; the relative signs of F1 and F2 would help to resolve cases where F1 alone could not determine appropriate mappings.

Corresponding clusters from parallel runs of CA, then, revealed affinities between given meta-types and prepositions. In that way, the preposition ‘to’ was strongly associated with meta-types dominated by ‘a’ for ‘action’, as was explained in the discussion for outliers. Further, using an optimum number of 3 clusters revealed similar strong affinity between ‘of’ and meta-types heavy in ‘t’. The preposition

‘by’, in contrast, had as many ‘m’ meta-types as ‘a’ and ‘t’ combined. Having allowed machines to detect such patterns, attention turns next to using them in interpreting X and Y axes from CA graphs. That, in turn, will formalise such relationships between prepositions and things, actions, and modifiers, and ultimately raise important intensional knowledge about GRiST mind maps.

7.6. CA ON UNAMBIGUOUS TRIGRAMS: PASS 1

Method VIII for interpreting X and Y axes

Note that the following CA involved just the standard matrix of prepositions against meta-types, rather than comprising separate parallel runs. Labels from resulting column clusters for set J were split into individual meta-types, and frequencies of occurrence compiled for those discrete meta-types within any column cluster. A particular meta-type dominated if it accounted for 50% or more of any cluster’s meta-types. In such cases, the F1 and F2 values of those CA results indicated the relevant axis.

Results VIII for interpreting X and Y axes

Table 7.14 presents CA results for 3 clusters that arose from the second pass of CA. The first group of columns show prepositions from the labels of points in row clusters. After those labels come the average F1 and F2 values for any row cluster. The second part deals with column clusters: first, the labels, then counts for individual meta-types within those composite values. The highest count for each meta-type appears in bold face, with the percentage of any entire cluster shown in the last column:

Row Clusters F1 F2 Col Clusters Meta-Types

a m t %

of -538 -410 tt, mt, tm 0 2 4 67

by, on, in, with, for, about, as, like -266 939 at, mm, am 2 3 1 50

to 852 -81 ta, ma, aa 4 1 1 67

Table 7.14: Cluster mappings for phase 1

The last three columns of Table 7.14 show dominant meta-types, highlighted in bold type, that accounted for 50% or more of any cluster. Further, minimum and maximum values for mean row F1 of -538 and +852 put respective prepositions ‘of’ and ‘to’ at opposite ends of the X-axis. Corresponding meta-types for ‘of’ were dominated by ‘t’ for ‘thing’, while ‘to’ strongly attracted points having ‘a’ for ‘action’; in that way appeared X-axis meta-types of ‘t’ at the negative extreme, and ‘a’ at the positive end.

In a similar way, minimum and maximum values of mean F2, for column clusters on the Y-axis, were -410 and +939. That revealed corresponding axis meta-types of ‘t’ at the negative extreme occupied by

‘of’, and ‘m’ at the positive end that contained ‘by’. Figure 7.7 shows output from MidmapPOSAnalysis that summarises the automated interpretation of axes:

Figure 7.7: Axis mappings from phase 1 of CA.

7.6. CA ON UNAMBIGUOUS TRIGRAMS: PASS 1

The final result, then, was that the negative extreme of the X-axis reflected things, while the positive end of that axis tended towards actions. In contrast, ascending the Y-axis depicted a transition from things to modifiers.

Discussion VIII for interpreting X and Y axes

Using the average F1 of clusters is reminiscent of the spatial mean, introduced in Chapter 6, that depicted entire clusters as single points. That was the case for all of the column clusters from this CA, and for the row cluster that contained ‘by’; prepositions ‘of’ and ‘to’, though, formed singleton clusters having mean values equal to the associated F1 values themselves. The extreme positions of prepositions ‘to’ and

‘of’ on the X-axis suggested that axis to reflect a continuum between things and actions. In contrast,

‘by’ and ‘of’ at Y-axis extremes suggested a transition from things to modifiers. To emphasise those axis meta-types, superimposed graphs seen earlier in Figure 7.4 are reproduced here as Figure 7.8:

Figure 7.8: CA graph of F1 x F2 for pass 1 (reprise).

The bottom-right quadrant of Figure 7.8 shows ‘of’ at the ‘thing’ end of the X-axis, while ‘to’ lies towards the ‘action’ end. In contrast, prepositions ‘of’ and ‘by’ respectively occupy the ‘thing’ and ‘modifier’ ends of the Y-axis. Indeed, those results for automatically interpreting axes further validate the optimum three clusters derived from the preliminary CA. That number faithfully reflected the three main clusters in differing quadrants of the resulting graph.

7.7. CA ON UNAMBIGUOUS TRIGRAMS: PASS 2

Note further that symmetrical meta-type pairs ‘aa’, ‘mm’, and ‘tt’ appear in quadrants that express those types most strongly. In addition, symmetrical pairs ‘mt’ and ‘tm’ are separated across the X-axis, where they might be seen as belonging to the group around ‘for’, just above the Y origin. CA, rather, determined that they belonged with ‘of’; in that respect, a machine performed at least as well as a human, if not better. Symmetrical meta-type pairs ‘at’ and ‘ta’ are similarly separated, though diagonally across bottom-right and top-left quadrants. That goes in addition for ‘ma’ and ‘am’, although that latter point is partially obscured by the preposition ‘by’.

Results from CA, then, produced valuable intensional knowledge for the emerging information base of GRiST mind maps. Corresponding row and column clusters, along with the derived axis extremes, will shortly be seen to determine meta-types for ambiguous or missing words from WordNet. That will be inhibited, though, by the strong influence of ‘to’ that skewed the graph, and made remaining clusters less obvious. In order to develop a better model of the left-hand side of the CA graph, for all prepositions but ‘to’, that involves a subsequent pass of CA, after having removed that outlier row.

7.7 CA on Unambiguous Trigrams: Pass 2

The first phase of CA identified the preposition ‘to’ as outlier that was, in addition, a singleton cluster that contained no further points. Although singletons are not necessarily outriders on a graph, they are sufficiently removed as to contrast sharply with remaining clusters. Having noted such strong correspon-dences, outlier row clusters are removed in order to allow any released meta-types to reconfigure around the next most strongly attracting preposition.

Indeed, the next most extreme outlier from any pass is likely to be the one removed on the following pass. In that way, graded correspondences arise: the later they appear around any preposition the weaker the attraction between them. All the same, useful trigrams might arise that were not available from earlier passes, as will be seen from experiments presented next.

Method V for subsequent phases of CA

The MidmapPOSAnalysis class identified outliers for each phase. After noting any corresponding column points, matrix rows for outliers were removed before the succeeding phase, though remaining observations were untouched. In all, 5 phases of CA were performed, in paired runs: the first run in any phase identified optimum numbers of clusters for a second pass, which re-ran CA with an updated matrix header. That process was repeated for successive row outliers, gradually reduced the number of prepositions, while leaving intact all columns of meta-type pairs.

In document Discovering knowledge structures in mind maps of mental health risks (Page 196-200)