• No results found

3 Algorithm Efficiency Comparison

3.4 Lists Comparison

First we have to compare the list obtained automatically from the three corpora for the word dom (home/hose) with the reference list, i.e. human association list ob- tained from human subjects in the author’s experiment. The comparison will be pre- sented in terms of LMw(l1,l2) for l1 being the human association list and l2 being the lists obtained through LSA similarities and the association ratio f5 as described above. In the comparison we shall apply to the reference list, the three different window sizes.

To begin, we shall compare the full human association list that is 151 words long, to the lists generated by the algorithms described above. We restrict arbitrarily, the length of automatically generated lists to 1000 words.

Table 3. LMw(l1,l2) values for different w, for different l2 from various list sources , l1 being the human experiment result list

W PRUS f5 PAP f5 NCP f5 PRUS LSA PAP LSA NCP LSA

10 0 0 0 0 0 0

25 0 0 0 0 0 0

50 2 1 2 0 0 0

75 2 4 3 1 0 1

W PRUS f5 PAP f5 NCP f5 PRUS LSA PAP LSA NCP LSA

150 11 14 17 2 2 2

300 19 24 30 2 6 3

600 34 25 41 4 11 12

1000 36 43 49 7 13 18

That can be seen as excessive as it contains also a random association of low inter- est to us - the lists obtained through EAT and the author’s list comparison contain only 15 words.

Then we will restrict the human association list to only the first 75 words – that was also the length needed to obtain the combined list for home and house from the EAT.

Table 4. LMw(l1,l2) values for different w, for different l2 from various list sources , l1 being the human experiment result list restricted to 75 entries

W PRUS f5 PAP f5 NCP f5 PRUS LSA PAP LSA NCP LSA

10 0 0 0 0 0 0 25 0 0 0 0 0 0 50 2 0 2 0 0 0 75 2 4 3 1 0 1 100 3 5 8 2 0 1 150 8 9 10 2 1 1 300 11 15 21 2 5 1 600 21 23 30 4 7 5 1000 22 28 33 5 9 6

As can be seen, automatically generated association lists match some part of the human association list only if we use a large window size. Secondly, we can observe that Church and Hanks algorithm seems to generate a list that is more comparable to a human derived list.

The shorter word list in the EAT (house) contains 42 words. The 40 words is the window size, which applied to the author’s list, allow us to find all the elements com- mon to the EAT home/house combined list and author's experiment list for dom. Therefore we shall use a 40-word window for comparison.

Table 5. LMw(l1,l2) values for different w, for different l2 from various list sources , l1 being the human experiment result list restricted to first 40 entries

W PRUS f5 PAP f5 NCP f5 PRUS LSA PAP LSA NCP LSA

10 0 0 0 0 0 0

25 0 0 0 0 0 0

50 2 0 2 0 0 0

W PRUS f5 PAP f5 NCP f5 PRUS LSA PAP LSA NCP LSA 100 3 5 7 1 0 1 150 7 9 9 1 0 1 300 8 9 17 1 4 1 600 15 16 22 2 6 5 1000 16 20 22 3 6 6

As we can see this window size seems to be optimal, because it reduces substan- tially – if compared to the full list – the non-semantic associations for both algorithms. Finally we have to test automatically generated lists against the combined human association list, i.e. list which consists of words, which are present both in the au- thor’s list and the EAT lists, presented in Table 2.

Table 6. LMw(l1,l2) values for different w, for different l2 from various list sources , l1 being the human experiment result list restricted to words that are present in both the authors and the EAT experiment, see Table 2

W PRUS f5 PAP f5 NCP f5 PRUS LSA PAP LSA NCP LSA

10 0 0 0 0 0 1 25 0 0 0 0 0 1 50 0 0 1 0 0 1 75 0 1 3 0 0 1 100 0 3 3 0 0 2 150 3 4 5 0 0 2 300 4 8 5 0 1 2 600 8 12 9 0 2 3 1000 10 12 12 2 2 3

Those results show a tendency similar to that observed during the test of human association list in full length. First, the window size influences the matching number. The second observation is also similar: the list generated by the Church and Hanks al - gorithm matches better the human association list - it matches 10 or 12 out of 15 words semantically related to the stimulus.

To learn more, we repeated a comparison over a wider range of words. We se- lected 8 words: chleb (bread), choroba (disease), wiat oś ł (light), g owa ł (head), ksi ycęż (moon), ptak (beard), woda (water), olnierzż (soldier). Then we used the described method to obtain a combined list for the author’s experiment and the EAT.

Table 7. LMw(l1,l2) values for different word stimuli, different w, for different l2 from various list sources , l1 being the human experiment result list restricted to entries in both the authors and the EAT experiment

Word w PRUS f5 PAP f5 NCP f5 PRUS LSA PAP LSA NCP LSA

bread 25 0 1 0 0 1 1

Word w PRUS f5 PAP f5 NCP f5 PRUS LSA PAP LSA NCP LSA 1000 1 8 3 0 2 2 disease 25 0 1 0 0 0 0 100 1 3 5 0 0 0 1000 1 9 8 1 7 2 light 25 1 1 0 0 1 0 100 3 4 3 1 1 0 1000 3 5 3 4 5 2 head 25 1 0 2 0 1 1 100 1 2 4 0 1 1 1000 3 6 6 1 2 3 moon 25 0 3 3 1 0 2 100 3 4 5 1 0 3 1000 3 4 6 4 2 5 bird 25 1 2 1 1 0 1 100 2 4 2 1 0 2 1000 2 5 7 4 3 3 water 25 0 1 2 1 1 0 100 0 4 6 2 3 2 1000 4 8 10 3 5 6 soldier 25 2 2 2 2 1 3 100 2 5 5 2 6 3 1000 2 12 9 3 10 4

The table below contains similar comparison, but without restricting the associ- ation list to words contained in both experiments.

Table 8. LMw(l1,l2) values for different word stimuli, different w, for different l2 from various list sources , l1 being the unrestricted human experiment result list

Word w PRUS f5 PAP f5 NCP f5 PRUS LSA PAP LSA NCP LSA

bread 25 1 1 2 0 1 2 100 2 5 6 1 2 5 1000 4 19 12 3 4 9 disease 25 0 1 1 0 1 0 100 1 3 7 0 2 0 1000 3 13 14 1 13 8 light 25 2 1 1 1 1 0 100 6 6 4 3 1 0 1000 11 15 9 10 9 3 head 25 3 1 3 0 3 1 100 6 6 7 0 5 1 1000 17 17 12 7 9 7 moon 25 1 4 6 1 0 2 100 5 5 11 1 1 4

Word w PRUS f5 PAP f5 NCP f5 PRUS LSA PAP LSA NCP LSA 1000 5 9 15 7 5 12 bird 25 1 8 2 2 0 2 100 3 9 5 3 2 2 1000 5 13 19 8 9 9 water 25 1 2 3 1 1 1 100 3 7 8 2 4 3 1000 9 20 21 10 9 15 soldier 25 1 5 4 1 2 3 100 2 11 9 4 7 6 1000 3 25 22 9 20 11

As can be seen, the values in the columns corresponding to the f5 algorithm are

clearly better than the corresponding LSA values, regardless of the size of the human lists.

4 Conclusion

If we look at our results, we may find that in general they are comparable with the results of related research of Wandmacher (Wandmacher, 2005) and (Wandmacher, Ovchinnikova, Aleksandrov, 2008). Generally speaking the LSA algorithm generates an association list, which contains only a small fraction of the semantic relations, which are present in the human association norm. Surprisingly, the Church and Hanks algorithm does much better, which suggests that the problem of how the LSA-made associations relate to the human association norm should be investigated more care- fully. The first suggestion may be derived from (Wettler, Rapp, Sedlmeier, 2005) – we have to learn more about the relation between the human association norm and the text to look for a method more appropriate than a simple list comparison. A second sug- gestion may be derived from an analysis of the human association list. It is well known that such a list consists of responses, which are semantically related to the stimulus, responses which reflect pragmatic dependencies and so-called ‘clang re- sponses’. But within this set of semantically related responses one can find more fre- quent direct associations, i.e. such as those which follow a single semantic relation, e.g. ‘whole – part’: house – wall and not so frequent indirect associations like: mutton (baranina) – horns (rogi), which must be explained by a chain of relations, in our ex - ample: ‘source’ relation mutton (baranina) – ram (baran), followed by ‘whole – part’ relation ram (baran) – horns (rogi) or the association: mutton (baranina) – wool (we na), explained by a ‘source’ relation ł mutton (baranina) – ram (baran), followed

by ‘whole – part’ ram (baran) – fleece (runo), which is followed by a ‘source’ rela- tion’ fleece (runo) – wool (we na)ł . These association chains suggest that some associa-

tions are based on a semantic network, and it would be very interesting to test the LSA associating mechanism against these indirect associations.

Acknowledgement

This research was partially supported by EC grant FP7-218086, the INDECT project.

References

Borge-Holthoefer J., Arenas A., 2009, Navigating word association norms to extract

semantic information, in: Taatgen N., van Rijn H., Proceedings of the 31st Annual

Conference of the Cognitive Science Society, Groningen.

Budanitsky A., Hirst G., 2006, Evaluating wordnet-based measures of lexical seman-

tic relatedness. Computational Linguistics 32.1: 13-47.

Church K. W., Hanks P., 1990, „Word Association Norms”, Mutual Information, and

Lexicography. Computational Linguistics, t. 16, 1, p.22-29.

Deerwester S., Dumais S., Furnas G., Landauer T., Harshman R., 1990, Indexing by

Latent Semantic Analysis. Journal of the American Society for Information Science

41 (6): 391–407.

Gatkowska I., 2012, Jak słowa łączą się z sobą w umyśle użytkowników, Tertium Con- ference, Krakow, 2012.

Kent G., Rosanoff A. J., 1910, A study of association in insanity, American Journal of Insanity 67 (37-96), p. 317-390.

Kess J. F., 1992, Psycholinguistics: Psychology, linguistics and the study of natural

language. Amsterdam/Philadelphia: John Benjamins Publishing Company.

Kiss G. R., Armstrong C., Milroy R., Piper J., 1973, An associative thesaurus of

English and its computer analysis. in: The Computer and Literary Studies red.

Aitken, A.J., Bailey, R.W. Hamilton-Smith, N., Edinburgh University Press.

Korzycki, M., 2012, A dictionary based stemming mechanism for Polish NLPCS 2012, p. 143–150

Kurcz I., 1967, Polskie normy powszechności skojarzeń swobodnych na 100 słów z li-

sty Kent-Rosanoffa, Studia Psychologiczne, t.VIII, red. T. Tomaszewski, Wrocław-

Warszawa-Kraków, p.122- 255.

Landauer T. K., Dumais S. T., Latent Semantic Analysis, Scholarpedia, 3(11):4356, 2008.

Moss H., Older L., 1996, Birkbeck word association norms, Psychology Press. Nelson D. L., McEvoy C. L., Schreiber T. A, 1998, The University of South Florida

word association, rhyme, and word fragment norms.

Ortega-Pacheco D., Arias-Trejo N., Barron Martinez, J. B., 2012, Latent Semantic

Analysis Model as a Representation of Free-Association Word Norms, 11th Mexic-

an International Conference on Artificial Intelligence, MICAI 2012 , Puebla, p. 21-25

Palermo D. S., Jenkins J. J., 1964, Word Associations Norms: Grade School through

College, Minneapolis.

Postman L. J., Keppel G., 1970, Norms of word association, Academic Press.

Przepiórkowski A., Bańko M., Górski R,, Lewandowska-Tomaszczyk B., Łaziński M., Pęzik P., 2011, National Corpus of Polish. In: Proceedings of the 5th Lan- guage & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, pages 259–263, Poznań, Poland.

Rapp R., 2002, The Computation of Word Associations: Comparing Syntagmatic and

Paradigmatic Approaches, Proceedings of the 19th International Conference on

Computational Linguistics, Taipei.

Rapp R., 2008, The Computation of Associative Responses to Multiword Stimuli, Proceedings of the workshop on Cognitive Aspects of the Lexicon (COGALEX 2008): Coling 2008, p. 102–109. Manchester,

Sinopalnikova A., Smrz P., 2004, Word Association Thesaurus as a Resource for ex-

tending Semantic Networks, Proceedings of the International Conference on Com-

munications in Computing, CIC '04, Las Vegas, Nevada, USA, p. 267-273.

Schulte im Walde S., Borgwaldt S., Jauch R., 2012, Association Norms of German

Noun Compounds, in: Proceedings of the 8th International Conference on Lan- guage Resources and Evaluation. Istanbul.

Wandmacher, T., 2005, How semantic is Latent Semantic Analysis, Proceedings of TALN/RECITAL 5 .

Wandmacher T., Ovchinnikova E., Alexandrov T., 2008 Does Latent Semantic Ana-

lysis reflect human associations , In Proceedings of the ESSLLI Workshop on Dis-

tributional Lexical Semantics

Wettler M., Rapp R., Sedlmeier P., 2005, Free word associations correspond to con-

tiguisties between words in text, Journal of Quantitative Linguistics, 12(2/), p. 111

A Cognition-Oriented Approach to Fundamental