• No results found

Constructing the Preposition Taxonomy

3 Investigation into Morphology

XML element

4.2.4 Constructing the Preposition Taxonomy

The TPP data and the associated taxonomy files released with it imply a taxonomy of prepositional semantic roles (Litkowski & Hargraves, 2006), which is an advance on the Preposition is assigned to the instance field of the corresponding PrepositionRecord. Sense numbers are assigned to each Preposition object restarting from 1 for each preposition word form.

88 A

PrepositionalSynset is found if the PrepositionRecord corresponding to the preposition sense has a valid ID field (> 0), which will be equal to the ID of the PrepositionalSynset. Otherwise, its synonyms are searched for a valid ID. If every synonym ID found is valid and equal, then the corresponding

PrepositionalSynset with that ID is retrieved from the global synset map encapsulated in the wordnet.

89 When a new

PrepositionalSynset is created, it is assigned the next available ID, starting from

500000000, such that each ID is unique in the wordnet. The value of the ID has no significance apart from indicating the order of creation. The fields of a PrepositionalSynset include a set of superordinate

taxonomic categorizers, a single semantic role type and a set of complement properties, none of which are initialised with any data by the constructor.

90 If unequal IDs are found, any

PrepositionRecord representing a synonym with a superordinate

taxonomic categorizer different from that of the PrepositionRecord corresponding to the preposition sense is removed from the synonym list and the search for a unique valid ID is repeated. If unequal IDs are still found a fatal exception is thrown.

91 When a

Preposition is added to a PrepositionalSynset, the ID of the PrepositionalSynset is copied to the Preposition and to the corresponding PrepositionRecord. The gloss and examples from the

PrepositionRecord are added to the PrepositionalSynset. The superordinate taxonomic categorizer of the PrepositionRecord is added to the set held by the PrepositionalSynset. The semantic role type of the PrepositionRecord is assigned to the PrepositionalSynset but a fatal error occurs if it already has a

different one. The complement properties of the PrepositionRecord are added to those of the PrepositionalSynset. In all cases, every Preposition representing a synonym of the current

PrepositionRecord is added to the new PrepositionalSynset unless it already has a valid ID, indicating that it has already been added. If it does have a valid ID, but this differs from the ID of the new

PrepositionalSynset, indicating that the synonym has been added to another synset, then the

superordinate taxonomic categorizer of the synonym is compared with that of the current

PrepositionRecord. If it differs, then the synonym is removed from the synonym list. If the superordinate

taxonomic categorizer is the same as that of the current PrepositionRecord, then the semantic role type of the synonym is compared with that of the current PrepositionRecord. If this also differs, then the current

taxonomy based on digraph analysis presented by Litkowski (2002), though largely consistent with it (§4.2.1.5). Since prepositions with diverse meanings can share semantic role types, the semantic role taxonomy is treated as applicable to senses of the same or synonymous prepositions. Because of the parallelisms between the usages of the same preposition in different roles (Jackendoff, 1983; §4.2.1.6), lexical distinctions between one PrepositionalSynset and another (with different lexical content) override this

taxonomy (§4.2.4.2).

4.2.4.1 Building the Implicit Taxonomy

A taxonomy map92 is created and populated with taxonomy records mapping from parents to lists of children, where each child is a semantic role type and each parent is either a semantic role type or a superordinate taxonomic categorizer. This information is read from taxonomy files, one for each semantic role type93. The taxonomy file for each semantic role type gives one or more parent types for that semantic role type.

A PrepositionalSynset list is created for each semantic role type which does not also

occur as a superordinate taxonomic categorizer, comprising every

PrepositionalSynset found in the global synset map with that type. A HYPERNYM

search is conducted for each PrepositionalSynset in the list: for each word form in each PrepositionalSynset, a list is obtained from the lexicon of every

PrepositionalSynset which includes that word form. Any PrepositionalSynset

which includes the word form and whose semantic role type, according to the taxonomy map, is the taxonomic parent of the semantic role type of the current

PrepositionalSynset, is added its the set of candidate HYPERNYMS94.

If there is only one candidate HYPERNYM for a PrepositionalSynset, then it is assigned as its HYPERNYM; if there are multiple candidate HYPERNYMS and any of

92

Map<String, List<String>>

93 The taxonomy files must be found in a subdirectory of the default directory called taxonomy. 94 Any empty semantic role type is excluded from this operation.

them are non-abstract (have one or more glosses or examples), then a fatal error occurs; if there are 2 candidate abstract HYPERNYMS for a PrepositionalSynset, one of which

has the same superordinate taxonomic categorizer, then that candidate is assigned as its HYPERNYM; otherwise all the candidates are assigned as HYPERNYMS.

When a PrepositionalSynset is assigned as HYPERNYM of another

PrepositionalSynset (its HYPONYM):

• a new Preposition is created for every word form of the HYPONYM not represented in the HYPERNYM;

• the relation to core sense field of each Preposition is defined as "CORE: " + the semantic role type of the HYPERNYM;

• each new Preposition is added to the HYPERNYM; • an entry for the HYPERNYM is added to the lexicon;

• a WordnetRelation of Relation.Type.HYPERNYM is encoded from each

HYPONYM to the HYPERNYM and its converse WordnetRelation of

Relation.Type.HYPONYM is encoded from the HYPERNYM to each HYPONYM.

4.2.4.2 High Level Abstract Taxonomy

Once the implicit taxonomy is complete, a new abstract HYPERNYM is created for each set of PrepositionalSynsets (its HYPONYMS), which share the same set of word forms and the same semantic role type and have, as yet, no HYPERNYM. The semantic role type of the abstract HYPERNYM is the parent semantic role type of the semantic role type of the HYPONYMS, as read from the taxonomy map95. Each abstract HYPERNYM has a Preposition encoded in it for each of the same set of word forms as are possessed by its HYPONYMS. The abstract HYPERNYM is then added to the global synset map. Relations are encoded between the HYPERNYM and its HYPONYMS in the

way described in §4.2.4.1. This procedure ensures that every non-abstract

PrepositionalSynset belongs to a taxonomic tree. Each of the top HYPERNYMS of

these trees represents the intersection between a combination of word forms and a superordinate taxonomic category corresponding to a semantic role type taxonomy.

In order to provide a high level abstract HYPERNYM for each combination of word forms possessed by any PrepositionalSynset which has no HYPERNYM, the same operation is now repeated, ignoring semantic role types. The HYPONYMS of each high level abstract HYPERNYM are the abstract HYPERNYMS for each superordinate taxonomic category with the same set of word forms96. Thus the resultant taxonomy comprises a high level lexical categorisation by combinations of word forms and a secondary classification corresponding to the classification of semantic role types into superordinate taxonomic categories.

4.2.4.3 Top Level Abstract Taxonomy

The properties of the preposition taxonomy so far constructed automatically were analysed using the method proposed for verbs (§2.2.2.2.1). Each PrepositionalSynset

without a HYPERNYM was defined mentally so that HYPERNYMS could be assigned manually, using an existing combination of word forms where possible, and assigning more than one where appropriate (Appendix 26). The following additional word form combinations, representing very high level abstractions, were found to be required:

• away from; not at • among; between • as not • near; with • caused by • not caused by • as why

96 A high level abstract HYPERNYM has an empty semantic role type and superordinate taxonomic

• as not why;

A high level abstract PrepositionalSynset is created to represent each of these additional word form combinations and is added to the global synset map; the lexicon is updated accordingly. Records are then read from file97, each of which comprises 2 fields which represent the word forms of the HYPONYM and the word forms of the HYPERNYM. The highest level synsets with each of the 2 combinations of word forms are found and relations are encoded between them with the first synset as HYPONYM and the second as HYPERNYM, as described in §4.2.4.1.

The resultant taxonomy has 6 top HYPERNYMS namely: • as • as not • at • near; with • not at • with reference to

This can be contrasted with Litkowski's (2002) original taxonomy (§4.2.1; Appendix 23). The differences are due to non-differentiation of preposition senses in Litkowski's presentation of his digraph analysis and the high priority given to synonym identification and lexical distinctions in the development of the taxonomy presented here.

4.2.4.4 Prepositional Antonyms

The top level HYPERNYMS in the second column of Appendix 26 were arranged alphabetically without duplicates and, wherever possible, each member of the resultant set was manually assigned an ANTONYM from the same set, with a common HYPERNYM (Smrž, 2003; Huang et al., 2002; Vossen, 2002; §2.2.2.3) in all cases except where one or both ANTONYMS are top HYPERNYMS (Appendix 27). The

ANTONYM data98 is read and processed in the same way as the top level ontology99, except that relations of Relation.Type.ANTONYM are encoded in both directions between

the pairs.

After each pair of top level ANTONYMS is encoded, ANTONYM relations are also encoded between those pairs of HYPONYMS of the top level ANTONYMS which have the same lexical content as the top level ANTONYMS, and the same superordinate taxonomic categorizer as each other. This operation is performed recursively so that ANTONYM pairings are cascaded down the taxonomy as far as the shared lexical content and superordinate taxonomic categorizer requirements hold without interruption. This creates symmetrical ANTONYM ancestries with a common HYPERNYM (§2.2.2.3). The resultant preposition taxonomy is headed by three pairs of ANTONYMS: {"as"} paired with {"as not"}, {"at"} paired with {"not at"} and {"near"; "with"} paired with {"sans"; "without"}; {"with reference to"} has no ANTONYM.

Encoding of ANTONYMS is the final phase of enrichment of the WordNet model with prepositions. No claim is made regarding the originality or completeness of the information regarding prepositions. Simply a major gap in the coverage of WordNet has been filled, to the minimal extent necessary, with data discovered by the latest research. The assignation of prepositions to synsets and the encoding of relations between them has been documented and, as far as possible, data-driven.