3 Investigation into Morphology
XML element
4.2.3 Prepositional Synonym Identification
4.2.3.1 Spelling Variants
Some monosemous preposition headwords are spelling variants of other polysemous preposition headwords79, where the full range of senses is not listed but there is a single
<S> (sense) element.80. Every PrepositionRecord corresponding to one of these monosemous headwords is removed from the main preposition map and a
PrepositionRecord list is obtained from its synonym81. Each PrepositionRecord
listed is cloned and the clone's word form is changed to that of the monosemous preposition. The clone is added to the valid synonyms field of the PrepositionRecord
cloned and the PrepositionRecord cloned is added to its clone's valid synonyms.82.
4.2.3.2 Encoded Synonyms
The TPP file specifies which synonym headwords are synonyms of each preposition sense, but does not specify which sense of a synonym is the synonymous sense. As synonyms must necessarily have a common semantic role type, synonym identification can be performed by comparing the semantic role types of each PrepositionRecord
representing the sense of one preposition with those of each PrepositionRecord
79 as for instance "frae" is synonymous with "from" (§4.2.1.5). 80 In these cases, typically the text content of either the
<cprop> (complement properties) element or the <srtype> (semantic role type; §4.2.1) element refers to the other preposition, the text content of element
<sup> (superordinate taxonomic categorizer) is "Tributary" and the content of the <srel> (relation to core
sense) element either is "informal sound spelling." or starts with "core: " (file uniquePrepositionSenses.txt).
81
In such cases, because of some inconsistencies in the encoding, two separate PrepositionRecord lists are made for the polysemous headword: one list comprises every PrepositionRecord mapped to from the headword contained in the complement properties field of the monosemous preposition's
PrepositionRecord, with the prefix "SEE " removed; the other list comprises every PrepositionRecord
mapped to from the headword contained in the semantic role type field of the monosemous preposition's
PrepositionRecord, with the prefix "ALL_" removed. These fields have been converted to uppercase to mask inconsistencies. If the word forms obtained from the two fields of the monosemous preposition's
PrepositionRecord are the same, then only one list is used; if one list is empty then the other is used; otherwise the intersection of the two lists is used.
82
representing its synonym. This leaves fewer ambiguities than comparing superordinate taxonomic categorizer fields, and can be confirmed by comparing synonym fields to ensure that the word form of each is listed as a synonym of the sense of the other.
Each sense of each synonym of each sense of each preposition83 is examined to see if the semantic role types of the two senses are identical. If a single synonym sense is found for any preposition sense with an identical semantic role type and each headword is listed as a synonym of the other sense, then the PrepositionRecord representing that synonym sense is added to the valid synonyms field of the PrepositionRecord representing the preposition sense of which it is a synonym.
During development, the 18 sets of multiple matching senses of synonymous prepositions were written to a file84. These were manually reviewed and the multiple synonymous senses were re-categorised as synonym, hypernym or hyponym85. The status of each
PrepositionRecord which represents a member of such a set is read from this file86 as one of these three relation types.
4.2.3.3 Creating Prepositional Synsets
For each sense of each preposition word form, a new object is created of class
Preposition, which inherits from class WordSense87. Each time a Preposition object
83 excluding those with variant spellings removed from the main preposition map 84 Triple matched synonyms.csv comprising multi-line records specifying the fields of a
PrepositionRecord grouped in such a way that the first record in each of the 18 groups represents a sense of a preposition headword, and the remaining records in the group represent the multiple synonymous senses of its synonymous headword.
85
in another column.
86 Triple matched synonyms.csv is read in the same order as it was written, such that when multiple senses
of a synonym of a sense are found, the next group of records from the file will correspond to the same sense followed by its multiple synonym senses (all of which necessarily have the same headwords). The
PrepositionRecord is added to the valid synonyms, valid hypernyms or valid hyponyms field as appropriate, within the PrepositionRecord representing the preposition sense of which it is a synonym. Each PrepositionRecord listed in the variant spellings field of the PrepositionLoader is then restored to the main preposition map.
87 The word form and relation to core sense fields are assigned from the data held in the
is created, the PrepositionalTaxonomyBuilder creates or finds the corresponding PrepositionalSynset88. If no synonymous ID is found, a new PrepositionalSynset
is created89 and added to the global synset map90. The newly created Preposition is added to the PrepositionalSynset91. Once a Preposition has been created from every
PrepositionRecord, and assigned to a PrepositionalSynset, the lexicon is updated with the new data. 800 prepositional synsets are created, containing 1111 prepositions representing 312 word forms.