Class Expression Learner for Ontology Engineering (CELOE)

2.3 Related Work

2.3.4 Class Expression Learner for Ontology Engineering (CELOE)

CELOE is a top-down OWL class expression learning algorithm in the DL-Learner framework, which is the most recently developed OWL class expression learning algorithm [82]. This algorithm uses a downward reﬁnement operator that supports the

ALC description logic language to specialise the descriptions in the search space. The

implementation of this reﬁnement operator was extended to support more expressive

power such as datatype (D) and number restrictions (N).

As a typical top-down description logic learning approach, this algorithm starts

from a general concept, the TOPconcept by default, and uses the reﬁnement operator

to generate descriptions in the search space until an accurate description found. The selection of descriptions in the search space for reﬁnement (expansion) is based on the score of a description, which is mainly based on the descriptions’ accuracy. The score

of a descriptionC is deﬁned as follows:

score(C) =accuracy(C) + 0.2×accuracy gain(C)−0.05×n

whereaccuracy(C) is the accuracy of description C (see Deﬁnition 3.2);accuracy gain(C)

is the diﬀerence between accuracy ofC and its parent; andn is thehorizontal expansion

of C.

The horizontal expansion of a description is the sum of its length and the number of times it was refined. A description in the search space may be refined many times. The refinement operator is only allowed to produce descriptions with a length that is shorter or equal the horizontal expansion of the refined description. This is a mechanism used to deal with the infinite property of the refinement operator. As this algorithm finds single descriptions that describes all positive examples, overly specific descriptions are ignored, i.e. they are not added into the search space for further refinement (expansion). This algorithm was implemented and distributed with DL-Learner, an open source machine learning framework. It was well evaluated and compared with popular de-

scription learning algorithms. The evaluation shows that this is a promising OWL class expression learning algorithm that produce very concise (short and readable) concepts [57]. However, as this algorithm only uses the top-down learning approach, it cannot deal well with complex learning problems that require long descriptions to describe the positive examples.

Evaluation Methodology

This chapter describes the evaluation methodology of the proposed algorithms in this study. We ﬁrst provide a description of evaluation metrics. Next, there is a detailed account of our experimental framework including a cross- validation procedure, a statistical signiﬁcance test method. The control of the algorithm terminations is also discussed. The chapter ends with the introduction of comparison algorithms and evaluation datasets.

3.1 Introduction

As was described in the introductory chapter, this thesis proposes four new approaches to description logic learning. The aim of the approaches is to improve the learning speed, the capability to deal with complex learning problems, and the ﬂexibility to trade oﬀ the predictive correctness and completeness.

In this chapter, we describe the evaluation methodology for our proposed algorithms. A thorough evaluation includes running the algorithms on selected datasets and gathering interested metrics that help to reﬂect achievements of the proposed algorithms. Then, the experimental results are compared with existing algorithms to assess the achievements of our algorithms. Therefore, the evaluation methodology includes the selections of: i) evaluation metrics (described in Section 3.2), ii) a method for measuring the evaluation metrics and comparison with existing algorithms (described in

Section 3.3.1), iii) some existing algorithms for comparison with our algorithms (described in Section 3.3.2), and iv) a set of evaluation datasets (described in Section 3.3.3).

3.2 Evaluation Metrics

Selected evaluation metrics must reﬂect the essential features of the evaluation algo-

rithms. In machine learning, predictive accuracy is the most important metric. It

represents the predictive power of the learnt concepts. However, it is useful to have a more thorough assessment of a learning algorithm. Based on the aims of this thesis we

are also interested inlearning time,search space size anddeﬁnition length. The learning

time represents the speed of an algorithm, whereas the search space size indicates the effectiveness of the memory usage. The last metric, definition length, provides a measure of the readability of the definition (in general, short definitions are more readable than long definitions). Computation of these metrics is defined below.

3.2.1 Accuracy

Accuracy is a combination of completeness and correctness. In Definition 2.17, complete, correct and accurate concepts were defined. However, that definition is used to assess these metrics qualitatively, i.e. whether a concept is complete or incomplete, correct or incorrect and accurate or inaccurate. We are defining these metrics in a

diﬀerent context: for measuring theamount of completeness, correctness and accuracy.

Before introducing the calculation of these metrics, we restate the definition of instance retrieval (see Definition 2.15) in form of a formula to simplify the use of this task. Definition 3.1 (Cover). Let P = K,(E+,E−) be a learning problem defined in

Deﬁnition 2.16, E = (E+,E−) andC be a concept. Then, cover(K, C,E) is a function

that computes a set of examples covered by C with respect toK and is deﬁned as:

cover(K, C,E) ={e∈E|e is covered by C with respect toK}

Deﬁnition 3.2 (Completeness, correctness and accuracy).LetP =K,(E+,E−) be a learning problem. Then,

• completeness is the ratio of positive examples covered by C to the total number of positive examples:

completeness(C, P) = |cover(K, C,E+)| |E+_|

• correctnessis the ratio of negative examples uncovered byC to the total numbers of negative examples: correctness(C, P) = 1−|cover(K, C,E −₎_| |E−_| = |E −_\_cover₍_K_{, C,}_E−₎_| |E−_|

• accuracyis the ratio of number of positive examples covered byC and the number

of negative examples uncovered byC to the total number of all examples:

accuracy(C, P) = |cover(K, C,E

+₎_|₊_|_E−_\_cover₍_K_{, C,}_E−₎_|

|E+_∪_E−_|

Accuracy with respect to training data is called training accuracy and accuracy

with respect to test data is calledpredictive accuracy.

3.2.2 Learning time

Thelearning timeof a learning algorithm is counted from when it starts to search for a deﬁnition until the deﬁnition is found or the timeout is reached. The time for loading the knowledge base into the reasoner is not counted. There are two basic methods to measure learning time: using wall-clock time, the actual time that elapses from start to end of a learning task; and using CPU time, which measures only the actual time that the CPU works on the learning task. Technically, computation of CPU time is more complicated than wall-clock time. However, if the evaluation system has constant loads, wall-clock time is approximately equal to CPU time. Therefore, in our experiments,

we compute learning time using wall-clock time. To ensure the system has constant loads, we can manually observe then system loads while the learners is running.

3.2.3 Deﬁnition length

The calculation of this metric was defined in Definitions 2.21 and 2.22. It is the total number of symbols that appear within the definition excluding punctuations such as “(”, “)”, “.” and “,”. For example, the length of the following definition is 5:

Male ∃hasChild.Person

(or in Manchester OWL syntax: Male AND hasChild SOME Person).

In our implementation, the nomalisation procedure in the DL-Learner framework is used to normalise the learnt deﬁnition.

In document Symmetric parallel class expression learning : School of Engineering and Advanced Technology, Massey University, New Zealand : a thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science (Page 53-59)