CHAPTER 4 DESIGN FRAMEWORK AND ARCHITECTURE OF SWA-KMDLS
4.1 Ontological Learning Object Management
4.1.3 The Ontology Learning Module
The main part of SAOKBCS framework is the Ontology Learning (OL) module. Ontology learning simply can be defined as any (semi-) automatic support for Ontology development. In the context of Semantic Web, Ontology learning is primarily concerned with knowledge acquisition from and for Web content and is thus moving away from small and homogeneous data collections to tackle the massive data heterogeneity of the World Wide Web instead (Buitelaar et al. 2003). This research work emphasized Ontology learning as (semi) automatic support for Ontology development from LOs namely text Learning Materials (LMs).
The OL module in the framework is intended to construct two kinds of Ontology from its input corpora i.e. LMs. The first Ontology is Context Ontology, which models the context of LOs for example who is the author, what subject the LMs is intended for, what is the
format, etc. The second Ontology is Content Ontology, which models the latent knowledge inside the particular LMs, for example in the case of LM containing introduction to programming, the content Ontology models some fundamental concepts and relations between concepts in programming. The following subsections give more detail discussion on how both context and content Ontology are constructed.
(i) Constructing Context Ontology from LOs
In Constructing Ontological knowledge base, OL module has subtask to construct an Ontology that describe (model) the context of the particular uploaded LO. This Ontology is constructed automatically. The prototype of LO’s context Ontology is depicted by Figure 4-3.
As shown in Figure 4-3, the context of LO is identified by:
(i) Subject, it means to what subject or course the LO is intended to. (ii) Contributor, it means who is the contributor or author of the LO.
(iii) Accessibility, whether LO is Open or Restricted to authorized user only. (iv) Format, indicates the file format of LO such as: txt, ppt, doc, htm/html, or pdf. (v) LO is also identified by its Title, Description, Abstract, and relation with other
LO. The relation means what other prior LOs that related to the particular LO.
(ii) Constructing Content Ontology from LOs
The construction of content Ontology from LO constitutes subtasks concerning with the definition of terms, synonyms, concepts, concept hierarchy, relations, relation hierarchy, axiom schemata and general axioms that representing knowledge in the LO content. This Ontology artifacts and its example are illustrated as “Ontology Learning Cake” that is depicted in Figure 4-4. This “Ontology Learning Cake” is derived from (Buitelaar et al. 2003; Cimiano 2006).
The subtask of OL module while developing the LO’s content Ontology in extracting each Ontology artifact concisely is detailed in Table 4-1
y)) love(x, y) (x, xy(married → ∀
disjoint (river, montain) capital_of ≤R locate_in
cure(dom:Doctor,range:Desease) is_a (DOCTOR,PERSON) DESEASE:=<I,E,L> {desease, illness} desease, illness, hospital
Figure 4- 4 Ontology Learning Layer Cake
Terms Synonyms Concepts Concepts Hierarchy Relations Relation Hierarchy Axiom Schemata General Axioms Rule
Table 4- 1 Ontology Learning Task based on Ontology Artifact
Ontology Artifact Ontology Learning (OL) Task
Terms Terms are any single word or multi-word compound relevant and specific for the domain. Thus, the task of Ontology Learning (OL) is to extract a set of strings SC and SR
representing terms that will be used as signs for concept and relations, respectively.
Synonym Finding words which denote the same concept and which thus appear in the same set RefC(c) for a given concept c.
Concept Finding a triple <i(c), [[c]], RefC(c)> where i(c) is the intention
of concepts, [[c]] its extension and RefC (c) describes its lexical realization in a corpus.
Concept hierarchy Finding the hierarchical relation between concepts.
Relation Finding relation identifier or labels r that binary relate domain dom(r) and range(r).
Axiom Schemata Not learning the axiom schemata itself. But, learning which concepts, relations, or pairs of concepts the axiom in the system apply to, i.e. which pairs of concepts are disjoint, which relations are symmetric, the cardinality of relation, etc.
General Axiom Deriving more complex relationships and connections between concepts and relations.
To do this work, the proposed framework adapted Text2Onto architecture(Cimiano and VÄolker 2005) (See Figure 4.3). Text2Onto has three main features that distinguish it
from state-of-the-art Ontology learning frameworks such as TextToOnto (TextToOnto 2008), the OntoGen (OntoGen 2006), the OntoBuilder (Roitman and Gal 2006; OntoBuilder 2008), OntoLearn (Navigli and Velardi 2004) or OntoLT (Buitelaar and Sintek 2004; Paul Buitelaar1 2004; OntoLT 2008). First, by representing the learned knowledge at a meta-level in the form of instantiated modeling primitives within a so called Probabilistic Ontology Model (POM), make it independent of a concrete target language while being able to translate the instantiated primitives into any (reasonably expressive) knowledge representation formalism. Second, user interaction is a core aspect of Text2Onto and the fact that the system calculates the confidence value for each learned object allows designing sophisticated visualizations of the POM. Third, by incorporating strategies for data-driven change discovery, it avoids processing the whole corpus from scratch each time it changes, only selectively updating the POM according to the corpus changes instead. Thus, besides increase its efficiency the evolution of the Ontology with respect to the changes in the underlying corpus can be traced.
The architecture of Text2Onto in Figure 4-5 concisely can be explained as below:
(i) The text file as input in the architecture is called a Corpus, from which the Ontology is constructed (semi-)automatically.
(ii) Before being invoked to the further process, the corpus is pre-processed by Natural Language Processing (NLP) module such as tokenization, sentence splitting, lemmatizing or stemming and shallow parsing. It is based on GATE (Gate 2000)and JAPE (Cunningham et al. 2000) framework.
(iii)The algorithms are initialized by a controller, the purpose of which is: to trigger the linguistic preprocessing of the data (NLP), to execute the Ontology learning algorithms in the appropriate order, and to apply the algorithms' change requests to the POM (Algorithms detailed in Table 4- 3).
(iv) The execution of each algorithm consists of three phases: notification phase in which the algorithm learns about recent changes to the corpus; computation phase in which the changes are mapped to changes with respect to the reference repository where all kinds of knowledge about the relationship between the Ontology and the data (e.g. pointers to all occurrences of a concept) are stored; and result generation phase in which requests for POM changes are generated from the updated content of the reference repository.
(v) The center of the architecture is Probabilistic Ontology Model (POM), which stores the results of the different Ontology learning algorithms. POM is a collection of instantiated modeling primitives (see Table 5-2) which are independent of a concrete Ontology representation language. In fact, Text2Onto includes a Modeling Primitive Library (MPL) which defines these primitives in a declarative fashion.
(vi) So called Ontology writers are responsible for translating instantiated modeling primitives from POM into a specific target knowledge representation language such as RDFS (RDFS 2004), OWL (OWL 2004), and F-Logic (Bruijn 2007).
Table 4- 2 The Text2Onto Ontology Primitive Model & Gruber Frame Ontology
Text2Onto Primitive Model Gruber Frame Ontology
Concepts CLASS
Concept inheritance SUBCLASS-OF
Concept instantiation INSTANCE-OF
Properties/relations RELATION
Domain and range restrictions DOMAIN/RANGE Mereological relations (part-of)
Equivalence
Table 4- 3 Algorithm For Primitive Model Extraction
POM Algorithm
Concepts Relative Term Frequency (RTF), TFIDF (Term Frequency Inverted Document Frequency), Entropy, and the C-value/NC-value method Subclass-of Relations Algorithms for exploiting the hypernym structure of WordNet (Miller
2006), matching Hearst patterns 1(Hearst 1992) , and Linguistic
heuristics.
Part of JAPE expressions. General Relations
Shallow parsing strategy to extract sub categorization frames enriched with information about the frequency of the terms appearing as arguments.
Instance-of Relations Similarity-based approach extracting context vectors for instances and concepts from the text collection and assigning instances to the concept corresponding to the vector with the highest similarity with respect to their own vector.
Equivalence Similarity algorithm between terms based on contextual features extracted from the corpus, whereby the context of a terms varies from simple word windows to linguistic features extracted with a shallow parser. This corpus-based similarity is then taken as the probability for the equivalence of the concepts in question.
1
The final output of OL module is an LO’s content Ontology represented in OWL (Web Ontology Language).