• No results found

Evaluating frequency measures

Evaluation of the validity of the various frequency-based measures of collocation has generally been limited to the intuitive assessment of a few top-ranked items (Evert & Krenn, 2001, pp. 1-2). This is probably due to the difficulty of specifying and

operationalising any independent criterion of accurate identification. As Clear notes, the most obvious course would be to compare frequency results with the results of an independent manual analysis. However, any such endeavour would be highly

problematic: not only would a manual analysis be prohibitively time-consuming, but part of the point of frequency analysis is that it is thought to be capable of uncovering patterns which are not immediately evident to the human analyst (Clear, 1993, p. 282). Indeed, it has been a constant refrain of corpus-based collocation study that “intuition is typically a poor guide to collocation” (McEnery, Xiao, & Tono, 2006, p. 83).

A few studies have, however, attempted to compare language users’ intuitions with frequency data. The earliest such study of which I am aware is that of Hoffman and Lehmann (2000), who elicited native and non-native speakers’ intuitions regarding 55 word pairs which were found to be strongly associated in the BNC (as measured by log-likelihood). Each pair consisted of a ‘low frequency’ node (occurring between 50 and 100 times in the corpus) and a collocate found within +/-3 words. Most were adjective-noun (24/55) or noun-noun (19/55) pairs, but the listing also included other parts of speech. Hoffmann and Lehmann prepared a questionnaire in which each node was presented without its pair and asked 16 native and 16 nonnative-speaker

informants to supply the collocates. It was found that, on average, native speakers supplied the ‘correct’ collate in 70% of cases, a figure which the authors judge to be surprisingly high, given the widespread scepticism about the accuracy of intuitions.

Non-native speakers, unsurprisingly, did less well, achieving an average accuracy of only 34%. The native speaker ‘success rate’ of 70%, however, appears to provide some support for the validity of log-likelihood. In a similar vein, Siyanova and Schmitt (2008) found a significant correlation between the frequency of collocations in the BNC and scores out of 6 given by both native and non-native speakers for the ‘typicality’ of the collocation (for natives rs = .578, for nonnatives rs = .440).

Moreover, native (though not non-native) speakers were able reliably to distinguish ‘medium frequency’ (21-100 occurrence in BNC) from ‘high frequency’ (>100 occurrences in BNC) collocations.

While these studies offer some encouragement that frequency-based measures are able to detect items which have psychological reality for speakers, they do not attempt to assess the relative merits of the different statistics described above. One paper which does make such an attempt is that of Evert and Krenn (2001). They automatically retrieved around 4,500 adjacent adjective-noun pairs which occurred at least twice in an 800,000 word corpus of German law texts, and around 15,000 pronoun-noun-verb triples which occurred at least three times each in an eight million word portion of the Frankfurter Rundshau Corpus. Two native speakers were asked to identify those

adjective-noun pairs which they perceived as ‘typical combinations’ (including idioms, legal terms, and proper names) and those pronoun-noun-verb triples in which there was a grammatical relation between the verb and the PP, and the triple could be interpreted as support verb construction and/or a metaphorical or idiomatic reading was available. Collocations were taken to be positively identified if they were picked by either informant. Association measures (raw frequency, log-likelihood, t-score, chi- squared, and mutual information) were also calculated for all items on the lists and separate ranked lists produced for each measure. Finally, ‘precision’ and ‘recall’ graphs were generated for each list. Precision graphs showed the percentage of items at each level of the lists which were manually-identified collocations; recall graphs showed the cumulative percentage of manually-identified collocation which had been found at each level of the lists. They found that for adjective-noun pairs, t-score and log-likelihood provided the best predictions, while for the pronoun-noun-verb triples, t-score and raw frequency were the best. In both cases, chi-squared and mutual information were the worst predictors.

While Evert and Krenn’s paper is useful in showing (in the form of its precision and recall graphs) the sort of shape which a thorough examination of association measures might follow, the generalisability of its findings must be questioned given the small number of informants used (i.e. two, with identification by only one necessary to mark an item as a collocation). Moreover, their specification of items which are to count as collocations (idioms and metaphors, technical terms, proper names, support verb constructions) is rather narrower than the set of potentially psychologically-real word pairs in which the current thesis is interested. The study described in the following section will attempt to go beyond Evert and Krenn’s analysis by considering how accurately the various frequency-based methods can predict psychological associations between words, making use of published norms of word association collected from large numbers of participants. It will also attempt to define some approximate rules of thumb as to what levels of each measure are likely to indicate psychologically real collocations.

4.4 Frequency measures and word association.

Introduction

The psychological associates of a word are those other words which first come to a person’s mind when they see or hear it. There has been an interest in establishing ‘norms’ of association since the beginning of the nineteenth century, when they were used as a measure of sanity. Observing that a “derangement” in the “association of ideas” was one of “the most striking and commonly observed manifestations of insanity”, Kent and Rosanoff (1910) attempted to establish the common types of association and the variation within normal populations by reading a list of 100 stimulus words to over 1,000 subjects and asking them to respond to each with the first word that occurred to them other than the stimulus word itself. Since the 1960s, word association has come to be used to be used in language studies, where it has been thought to provide evidence about first and second language acquisition and the structure of the mental lexicon (Fitzpatrick, 2007, pp. 320-321).

Word associations are of interest to us because they have been widely linked to collocation. Observing that many associated words appear to be collocates of each

precisely because they are encountered together on a regular basis (Charles & Miller, 1989). This link between collocation and association has been tested empirically by Spence and Owens (1990), who showed that a group of 47 associated noun-noun pairs co-occurred more frequently, in spans of text ranging from 50 to 1,000 characters (about 10 to 250 words), in the one million-word Brown corpus than did matched non- associates. Moreover, strength of association (as measured by the percentage of respondents providing a particular response) was correlated with frequency of occurrence up to spans of 2,000 characters. This suggests, Spence and Owens conclude, that the co-occurrence of words in language is a major contributor to their being linked in word association norms.

If this is right, then at least part of what is being evidenced by word association tests is the proposed psychological representation of high frequency collocations which this chapter has set out to investigate. It should, therefore, be possible to use such norms to gauge the ability of the various frequency-based measures described above to detect collocations which are likely to be psychologically-real for speakers. The test is imperfect because, though we can conclude with some confidence that collocations which appear in word association norms are linked in the minds of at least some of the population sampled, non-appearance does not necessarily mean that words are not so linked. Indeed, it seems likely that only a small proportion of the total number of psychologically-real collocations will be tapped by association tests (especially since such tests typically elicit only one response per participant). Similarly, not all

associates are necessarily collocations, since other relationships (e.g. between paradigmatically-related pairs) are also commonly found in association norms. We should not, therefore, expect either all mentally-represented collocations to appear in the association norms or all associations to be mentally-represented collocations. Nevertheless, it seems fair to assume that measures which are good predictors of those psychological collocations which are attested as associates will be good predictors of psychological collocation in general. The research reported in this section explores this possibility by comparing frequency data for collocations in the British National Corpus (BNC) with associations reported in a set of association norms. In particular, it asks how well a set of ranked lists of collocations produced by various frequency- based methods predict the reported associations.

Method

The norms used in this study are taken from the Edinburgh Word Association

Thesaurus (EAT)2. This database was compiled by Kiss et al (1973) between 1968 and 1973. Researchers presented a range of stimulus words to informants and elicited for each the first word to come to mind. Each stimulus word was presented to 100 different people, most of whom were undergraduates at British universities. Clearly, the associations recorded in EAT are the associates of a particular group of people (British undergraduates) at a particular moment in time (1968-71). It is highly likely that if the same procedure were followed with a different group of participants, or at a different moment in time, many of the associations would be different. To take some simple examples, it is unlikely that if the same experiment were repeated today, the second most common associate of politics would be Wilson, or that mobile would fail to elicit the response phone; similarly if the data had been collected from bankers, rather than undergraduates, the most frequent associate of student would surely not have been me. In examining the relationship between high frequency collocations in the BNC and associates in EAT, therefore, we are examining whether the collocations are likely to have been psychologically real for one particular group of people at one particular moment in time. Far from being a weakness of the current approach, however, this is precisely the point. Many psychological collocations will vary from group to group (even from person to person) and from time to time. Equally, the collocations found in corpora will vary according to when data were collected and what sorts of texts were included. This, as I argued above, is one of the major reasons why drawing any inference from corpus to mind is problematic. The question we need to ask is, given such variability, how much stays constant, such that we can use a corpus like the BNC to make reasonably confident predictions about the mental associations of any particular group of native English speakers?

The base data for the study was a listing of several thousand modifier-noun

combinations which had been retrieved from a variety of texts as part of a separate study (see Section 5.3). Frequencies of occurrence in the BNC and a range of

association measures were calculated for these combinations (i.e., t-score, chi-squared, log-likelihood, mutual information (MI), z-score, and conditional probability). Since

association measures often work poorly with low frequency items, word pairs occurring fewer than 5 times in the BNC were excluded from further analysis. Also excluded were combinations whose modifier part was not listed as a stimulus in the EAT. For the remaining 3,168 combinations, the EAT was consulted to see whether the noun part of the combination was listed as an associate of the modifier part. Where the noun was an associate, the strength of association was also noted. On the basis of these data, two questions were addressed: 1) which frequency measure is the best predictor of psychological associations? and 2) what value of each measure is likely to indicate a psychological association?