In this thesis, we focus on methods for learning new OWL class expressions which describe example data in some way. This is a general technique which can be applied to describe patterns in sets of example data for the purposes of machine learning and data mining as described earlier (§2.2). For example, a set of learned OWL classes may act as a predictive model over unseen example instances, or identify interesting clusters or subgroups of example instances.
Various methods already exist for learning OWL class expressions from example data based around inducing expressions in Description Logics (DLs), which under- pin the formal semantics of OWL. These methods are closely related to those devel- oped for addressing a similar problem in the different yet related logical formalism of Logic Programs (LP) within the field of Inductive Logic Programming (ILP). The following section (§2.3.1) compares learning in DLs to ILP and highlights various DL learning techniques which were largely motivated by their applicability in ILP.
2.3.1 Comparison with Inductive Logic Programming
Despite the fact that learning in DLs is a new area of research, learning in logic- based formalisms in general is not. The field of Inductive Logic Programming (ILP) is a well-researched area of logic-based relational learning which employs Logic Pro- grams (LP) as the formalism for capturing data, background knowledge and hy- potheses. There are several key areas in which ILP differs to learning in DLs, includ-
§2.3 Learning OWL Classes 11
ing:
• Standards and uptake. The W3C recommended RDF, RDFS, OWL and XML Schema are widely used web standards which have enjoyed significant uptake to describe data and knowledge from many domains, particularly in the life sciences. Existing data and ontologies published in these formalisms may be leveraged directly to enrich and structure one’s own data for the purposes of learning in a DL. In contrast, background knowledge and data as logic pro- grams as used in ILP are not as widely accessible for these purposes.
• Expressivity. ILP algorithms cannot be applied directly to learning in DLs because of the mismatch in the syntax and semantics of the logics [14]. ILP systems typically employ a Horn or definite clause hypothesis and background knowledge representation based on Logic Programs (LP). DLs are different in that they permit complex concepts including positive disjunction, full negation and qualified cardinality restrictions which cannot naturally be expressed in LP, but which may be used in a DL concept based hypothesis language. LP systems can express multiple horn-clause definitions for predicates in learning, which enables them to capture more complex background knowledge than a DL knowledge-base. However, depending on the scale of the data, the results of such complex processing can nevertheless be captured in a DL-knowledge base.
• Generality.As semantic entailment of clauses in ILP is undecidable, the syntac- ticθ-subsumption notion of generality over clauses in ILP is often used as a de-
cidable substitute for comparing the generality or specificity of clausal hypothe- ses. In many DLs, a natural notion of generality is that of concept subsumption based on model inclusion for which decidable algorithms are known1. Concept subsumption in DLs is a semantic notion of generality which, when constrained by terminological background axioms (TBox), naturally constrain the space of permissible hypotheses in meaningful ways. This may be contrasted with syn- tactic θ-subsumption in ILP where, unless mode declarations [68] are used to
control the instantiation of variables in a tight way, many irrelevant hypotheses may be permissible which can unnecessarily inflate the search space.
• Complexity. Datalog clause coverage and subsumption (determined with θ-
subsumption) in ILP are NP-complete problems [37]. Alternatively, certain DLs
1In fact, most DLs are specifically designed to ensure that satisfiability is decidable. This is important
as the commonly used deductive inference tasks such as concept subsumption and instance checking are reducible to satisfiability checking.
such as E L++ permit PTime reasoning procedures for the analogous tasks of instance checking and concept subsumption [4]. Despite such tasks in expres- sive DLs having worse computational complexity (e.g. ALC for which such tasks are PSpace-complete), highly optimised reasoning strategies exist which make them tractable in practice [35]. A direct comparison of the complexity of these tasks is not straightforward, as concept subsumption in a DL knowledge base is usually performed with respect to the entire TBox2, whereas Datalog clause subsumption in ILP can be performed in a pairwise manner without respect to background clauses.
• Language bias. For particular DLs, some language bias is captured directly within the language and may be imposed by a TBox. For instance, the notion oftypebias in ILP [68] may be is captured with domain and range constraints on roles expressible directly in the language of DLs, restricting their applicability to certain classes. Additionally, mode declarations in ILP are irrelevant when learning in DLs as DLs are variable-free.
• Closed vs. open world assumption. Typically, ILP systems will assume a
closed world(CWA), whereby all facts or data not currently known to the system are assumed to be false. In contrast, DLs in the context of OWL typically make the assumption of anopen world(OWA), whereby no logical conclusions may be drawn from facts or data which are not currently known. With OWL, which is underpinned by DLs, an open world suits the nature of the intended application which is to describe data on the web, not all of which is feasible to capture in any one system for analysis such as logical reasoning. However, the chosen assumption has direct implications on the method and computational complexity of logical reasoning and related tasks such as retrieving the set of data described by a logical expression. In ILP, such tasks are usually tractable in a closed world as they involve algorithms of low computational complexity. In DLs which assume an open world, especially those of high expressivity, such reasoning tasks can have extremely high computational complexity which poses practical limitations on the size of any knowledge base and data set.
Despite these differences, several fundamental techniques developed in ILP research are relevant to learning in DLs. We now describe two of the most influential tech- niques in sections 2.3.2 and 2.3.3 which have motivated our work.
2However, note thatincremental classification is a technique which can be used to test for concept
§2.3 Learning OWL Classes 13
2.3.2 Non-standard inferences
Deductive inference problems in DLs are well-studied, including concept subsump- tion and instance checking. More recent research into so-callednon-standard inferences
in DLs include techniques for generating concept expressions from instance data via themost specific concept(msc) procedure (similar to constructing the so-called bottom clause with saturation in ILP) and theleast common subsumer(lcs) for concept expres- sions without disjunction (analogous to theleast general generalisation(lgg) of clauses in ILP) [5]. The LCSLearn algorithm for the DL C-Classic [18] employs the msc and lcs for inducing concepts from instance data in the way the ILP system GOLEM [67] learns clauses in a bottom-up manner. SONIC is a recent implementation of algorithms for computing the msc and lcs in the more recentE L++language [106].
2.3.3 Refinement operators
Motivated by techniques in ILP, refinement operators for generalising or specialising hypotheses to traverse the hypothesis search space have been researched for a num- ber of DLs including ALE R [7], E L [57] and even highly expressive DLs such as S ROI Q(D) which underpins OWL2-DL [58]. Implementations of DL learn- ers which implement a top-down search with refinement operators include DL- Learner[56] and DL-FOIL [30], the latter of which implements a covering (separate- and-conquer) approach for the DL ALC based on the well-known FOIL algorithm in ILP [78]. YinYang[43, 29] is another DL learner which combines top-down and bottom-up refinement search also in ALC. Each of these implementations are de- signed to induce a single concept expression which classifies instance data with high accuracy. Fr-ONT [55] uses top-down refinement for the DLE Lfor discovering fre- quent patterns in a DL knowledge base akin to the Warmr[48] algorithm for data mining in ILP. Recently, DL-Learnerwas extended [73] to learn in the probabilistic DL known ascrALC [19].
Systems which perform concept induction exclusively in the formalism of DLs using refinement based search have been described recently. Most notably, DL- Learner [56] is a system for DL concept learning over highly expressive DLs sup- porting supervised classification and unsupervised learning. Indeed many of our methods and our implementation, the OWL-Minersystem, were designed around improvements to the methods employed by DL-Learner. OWL-Minerdiffers from DL-Learner in its ability to support a variety of data mining and machine learn-
ing tasks including subgroup discovery. Also, while the search procedure that DL- Learneremploys is based on refinement, it is not guided by the distribution of data in the knowledge base in the way we have developed the specialisation operator for OWL-Miner. Other work on DL concept induction for supervised classification in- clude the YinYang[43] and DL-FOIL [30] systems which employ the less expressive DLs ALC in their hypothesis languages. Fr-ONT is another implementation of a concept learner in the DL known as E L++, roughly corresponding to the OWL-EL profile, and is designed to compute frequent queries in a data mining setting [55]. In contrast, OWL-Miner uses a more expressive hypothesis language, permits highly expressive background knowledge, and is capable of other learning tasks such as subgroup discovery.