2.3 Related Work
2.3.4 Class Expression Learner for Ontology Engineering (CELOE)
CELOE is a top-down OWL class expression learning algorithm in the DL-Learner framework, which is the most recently developed OWL class expression learning al- gorithm [82]. This algorithm uses a downward refinement operator that supports the
ALC description logic language to specialise the descriptions in the search space. The
implementation of this refinement operator was extended to support more expressive
power such as datatype (D) and number restrictions (N).
As a typical top-down description logic learning approach, this algorithm starts
from a general concept, the TOPconcept by default, and uses the refinement operator
to generate descriptions in the search space until an accurate description found. The selection of descriptions in the search space for refinement (expansion) is based on the score of a description, which is mainly based on the descriptions’ accuracy. The score
of a descriptionC is defined as follows:
score(C) =accuracy(C) + 0.2×accuracy gain(C)−0.05×n
whereaccuracy(C) is the accuracy of description C (see Definition 3.2);accuracy gain(C)
is the difference between accuracy ofC and its parent; andn is thehorizontal expansion
of C.
The horizontal expansion of a description is the sum of its length and the number of times it was refined. A description in the search space may be refined many times. The refinement operator is only allowed to produce descriptions with a length that is shorter or equal the horizontal expansion of the refined description. This is a mechanism used to deal with the infinite property of the refinement operator. As this algorithm finds single descriptions that describes all positive examples, overly specific descriptions are ignored, i.e. they are not added into the search space for further refinement (expansion). This algorithm was implemented and distributed with DL-Learner, an open source machine learning framework. It was well evaluated and compared with popular de-
scription learning algorithms. The evaluation shows that this is a promising OWL class expression learning algorithm that produce very concise (short and readable) concepts [57]. However, as this algorithm only uses the top-down learning approach, it cannot deal well with complex learning problems that require long descriptions to describe the positive examples.
Evaluation Methodology
This chapter describes the evaluation methodology of the proposed algorithms in this study. We first provide a description of evaluation metrics. Next, there is a detailed account of our experimental framework including a cross- validation procedure, a statistical significance test method. The control of the algorithm terminations is also discussed. The chapter ends with the introduc- tion of comparison algorithms and evaluation datasets.
3.1
Introduction
As was described in the introductory chapter, this thesis proposes four new approaches to description logic learning. The aim of the approaches is to improve the learning speed, the capability to deal with complex learning problems, and the flexibility to trade off the predictive correctness and completeness.
In this chapter, we describe the evaluation methodology for our proposed algo- rithms. A thorough evaluation includes running the algorithms on selected datasets and gathering interested metrics that help to reflect achievements of the proposed algo- rithms. Then, the experimental results are compared with existing algorithms to assess the achievements of our algorithms. Therefore, the evaluation methodology includes the selections of: i) evaluation metrics (described in Section 3.2), ii) a method for mea- suring the evaluation metrics and comparison with existing algorithms (described in
Section 3.3.1), iii) some existing algorithms for comparison with our algorithms (de- scribed in Section 3.3.2), and iv) a set of evaluation datasets (described in Section 3.3.3).
3.2
Evaluation Metrics
Selected evaluation metrics must reflect the essential features of the evaluation algo-
rithms. In machine learning, predictive accuracy is the most important metric. It
represents the predictive power of the learnt concepts. However, it is useful to have a more thorough assessment of a learning algorithm. Based on the aims of this thesis we
are also interested inlearning time,search space size anddefinition length. The learning
time represents the speed of an algorithm, whereas the search space size indicates the effectiveness of the memory usage. The last metric, definition length, provides a mea- sure of the readability of the definition (in general, short definitions are more readable than long definitions). Computation of these metrics is defined below.
3.2.1 Accuracy
Accuracy is a combination of completeness and correctness. In Definition 2.17, com- plete, correct and accurate concepts were defined. However, that definition is used to assess these metrics qualitatively, i.e. whether a concept is complete or incomplete, correct or incorrect and accurate or inaccurate. We are defining these metrics in a
different context: for measuring theamount of completeness, correctness and accuracy.
Before introducing the calculation of these metrics, we restate the definition of instance retrieval (see Definition 2.15) in form of a formula to simplify the use of this task. Definition 3.1 (Cover). Let P = K,(E+,E−) be a learning problem defined in
Definition 2.16, E = (E+,E−) andC be a concept. Then, cover(K, C,E) is a function
that computes a set of examples covered by C with respect toK and is defined as:
cover(K, C,E) ={e∈E|e is covered by C with respect toK}
Definition 3.2 (Completeness, correctness and accuracy).LetP =K,(E+,E−) be a learning problem. Then,
• completeness is the ratio of positive examples covered by C to the total number of positive examples:
completeness(C, P) = |cover(K, C,E+)| |E+|
• correctnessis the ratio of negative examples uncovered byC to the total numbers of negative examples: correctness(C, P) = 1−|cover(K, C,E −)| |E−| = |E −\cover(K, C,E−)| |E−|
• accuracyis the ratio of number of positive examples covered byC and the number
of negative examples uncovered byC to the total number of all examples:
accuracy(C, P) = |cover(K, C,E
+)|+|E−\cover(K, C,E−)|
|E+∪E−|
Accuracy with respect to training data is called training accuracy and accuracy
with respect to test data is calledpredictive accuracy.
3.2.2 Learning time
Thelearning timeof a learning algorithm is counted from when it starts to search for a definition until the definition is found or the timeout is reached. The time for loading the knowledge base into the reasoner is not counted. There are two basic methods to measure learning time: using wall-clock time, the actual time that elapses from start to end of a learning task; and using CPU time, which measures only the actual time that the CPU works on the learning task. Technically, computation of CPU time is more complicated than wall-clock time. However, if the evaluation system has constant loads, wall-clock time is approximately equal to CPU time. Therefore, in our experiments,
we compute learning time using wall-clock time. To ensure the system has constant loads, we can manually observe then system loads while the learners is running.
3.2.3 Definition length
The calculation of this metric was defined in Definitions 2.21 and 2.22. It is the total number of symbols that appear within the definition excluding punctuations such as “(”, “)”, “.” and “,”. For example, the length of the following definition is 5:
Male ∃hasChild.Person
(or in Manchester OWL syntax: Male AND hasChild SOME Person).
In our implementation, the nomalisation procedure in the DL-Learner framework is used to normalise the learnt definition.