The context of this research are the premises that upper ontologies are largely universal and that lexical ontologies such as WordNet could be comprehensively 9. Style: For detail on spelling style used for -ise or -ize in this document please refer to Section1.5
mapped to upper ontologies (Cimiano et al., 2011). Since the original mapping of WordNet to SUMO was done from one linguistic base only10 , the general research challenge, or the problem, is as follows: is this assumption that the universality of the upper ontology is preserved for the concepts realized in other languages, particularly in other language families, true (Pease et al., 2002)? Furthermore, does the language chosen to do the upper ontology specification and construction affect the concepts that are chosen for inclusion in the upper ontology?
The main research question emanating from this problem is: are core con- cepts11, from a proposed natural language family, currently included in an exist- ing, accepted upper ontology? Specifically, is every one of these core concepts equivalent to or subsumed by a concept in a defined upper ontology? These mappings from a computational perspective or alignments from a linguistic per- spective are from fundamental, acknowledged core concepts in a natural language to concepts existing in upper ontologies.
The focus in this dissertation is on non-Indo-European language families. In order to answer this core research question, two further aspects are investigated:
• The state of the art of mappings from other, specifically non-Indo-Euro- pean, language family concepts, to upper ontology concepts and
• mappings from the core concepts of an African language family, specifically the Bantu languages, to an upper ontology.
10. Note that the original Wordnet to SUMO mapping was done from Princeton WordNet (Reed and Pease,2015).
As already alluded to in Section1.1, the inclusion of a natural language core concept in an upper ontology will be affirmed in this study in one of two ways:
1. The core concept is equivalent to an upper ontology concept, or
2. The core concept is subsumed by an upper ontology concept.
So, if a core concept is found not to occur in an upper ontology, then no equiv- alence could be established between the core concept and any concept in the upper ontology. Furthermore, there is a possibility that although no equivalence can be established, a subsumption relation can be established between the core concept and a broader concept in the upper ontology.
This means that in order to answer the research question, there is an obli- gation to identify, inspect and count those natural language core concepts that either are equivalent to concepts in the upper ontology; or are subsumed by broader concepts in the upper ontology; or have no mapping possibility at all. A research outcome where this count results in either a few mapping possibilities or many subsumption relations would mean that there would be little equivalence overall, and would provide a qualitatively, negative answer to the research ques- tion. Alternatively, a large proportion of equivalence relations would provide a qualitatively positive answer to the research question.
For the language family under investigation, that is, the Bantu languages, this is, as far as we know, the first study of its kind. This research is therefore novel and exploratory, and the results largely qualitative.
Research sub-questions that follow from this main research question are:
in WordNets? This provides the linguistic background to the mapping process proposed.
• What is the state of the art of the upper ontology usage in the context of these natural language core concepts? This provides the computational background to the mapping process proposed.
• How do existing mappings of non-Indo-European language family core con- cepts to upper ontologies compare to that of Princeton WordNet? This provides the background to related work.
• What will a new structure of core concepts, from an African linguistic base, look like and how can it be compared to existing structures? Addressing this research sub-question is key in providing the practical results of a mapping process, which, once completed, contributes to answering the main research question. This sub-question is therefore intrinsically linked to the significance (or contribution) of this dissertation.
The accepted upper ontology used in this dissertation is the Suggested Upper Merged Ontology (SUMO), since this is the most common upper ontology to which WordNets are mapped. SUMO is also broadly representative (Mascardi
et al., 2007) of other upper ontologies. Therefore similar results should apply to
other upper ontologies:
SUMO and its domain ontologies ... form one of the largest for- mal public ontology(sic) in existence today. They are being used for research and applications in search, linguistics and reasoning (Mas-
Number Main Research Question
1 Are the core concepts from a proposed natural language family currently included in an existing, accepted upper ontology? 1a Is every one of these core concepts equivalent to or subsumed by
a concept in a defined upper ontology? Number Research Sub-Questions
2 What is the state of the art of the natural language core concept definition in WordNets?
3 What is the state of the art of the upper ontology usage in the context of these natural language core concepts?
4 How do existing mappings of non-Indo-European language family core concepts to upper ontologies compare to that of Princeton WordNet?
5 What will a new structure of core concepts, from a novel African linguistic base, look like, and how can it be compared to existing structures?
Table 1.1: Research questions