Prominent Systems for ILP - Inductive Logic Programming

2.3 Inductive Logic Programming

2.3.3 Prominent Systems for ILP

In this part of the thesis, we give short overviews of three systems for ILP, viz. Foil, Progol, and Tilde, which belong to the most frequently used from a large number of ILP systems that were introduced within the last 15 years. All three systems belong to the group of those that search first-order hypothesis spaces and come up with corresponding models. Also, they are all top-down learning systems, i. e. they start learning with a most general hypothesis covering all examples, which is specialized afterwards with the help of a refinement operator [120] to build a new clause D from a given clause C with Cθ ⊆ D.

However, there are also essential differences between the systems, which make the consideration of all three seem worthwhile. For instance, Foil uses a covering approach to rule learning, Progol applies an especially guided A*-like search, and Tilde upgrades decision tree learning methods to the case of first-order logic. All three systems are used for our empirical work as presented later in this thesis. Foil

Foil was first presented by Quinlan in 1990 [103], further advances in 1993 and 1995 [105, 106]. It combines ideas from ILP with approaches from propositional machine learning.

From ILP, it inherits the usage of clauses with their expressive power up to learning recursive hypotheses. Positive and negative examples E represent the target relation. Background knowledge B consists of some other relations. E and B have the form of tuples of constants and present the input for Foil, together with schema information.

From propositional machine learning, the system uses typical approaches for constructing hypotheses built of rules and approaches for the evaluation of parts of hypotheses.

Basically, Foil consists of two main loops, an outer loop and an inner loop, as typical for the covering algorithm for learning of rules [82]. The outer loop is running while there are still positive examples left in the training set, initially E. An inner loop is started to build a clause that characterizes a part of the target relation. Starting from a clause with an empty body, literals are added to the body to avoid the coverage of negative examples. Literal evaluation is achieved using criteria based on information theory again, cf. Section 2.1. If such a clause is found, all positive examples that are covered by the clause are removed from the training set.

2.3. INDUCTIVE LOGIC PROGRAMMING 25

We do not go into further detail here, but just mention that Foil also uses strategies for overcoming myopia — a single literal may be of not much value when considered for introduction on its own, but of high value in combination with others — and for avoiding problems with infinite recursion. Moreover, pruning strategies are applied and inexact definitions are allowed.

Quinlan states at the end of his early Foil paper [103] that the system will be adequate for handling learning tasks of practical significance, in the context of relational databases, partly caused by the correspondence of Foil’s input format with the format of relational databases. Our experimental results provide support for Quinlan’s prediction.

Progol

Progol was presented by Muggleton in 1995 [88] as a system that implements inverse entailment. Muggleton and Firth also provided a good tutorial introduction to the system [90].

The input for Progol consists of examples and background knowledge, where especially the latter may include non-ground and structured rules. Furthermore, mode declarations have to be provided by the user, declaring among others the target predicate, types of arguments, places for old or new variables or constants. For each example, Progol constructs a most specific clause within the mode language that is implied by the mode declarations.

For our purposes, i. e. in our experiments with all relations represented by ground non-structured facts, a most specific clause has the target predicate literal corresponding to the learning example in focus as head, and a conjunction of all facts to be found in the background knowledge which are related to the learning example as body.

These most specific clauses are then used to guide an A*-like search [94] through the clauses which subsume the most specific clauses.

Tilde

Tildewas presented by Blockeel and DeRaedt in 1998 [14] and has been further developed since then. It is now a part of the ACE system, cf. Appendix A.

Tildeis an upgrade of Quinlan’s C4.5 [104] and reuses many of the methods of propositional decision tree learning as sketched above, cf. Section 2.1. It uses the same heuristics as C4.5, among others gain ratio for the decision about questions to ask in nodes. Gain ratio is derived from information gain but does not have the same unjustified preference for attributes with many distinct values. Tildealso applies pruning mechanisms as C4.5.

Differences to the propositional case are that nodes contain a conjunction of literals and different nodes may share variables, with certain restrictions. The set of tests at a node is computed with the help of a refinement operator under

θ-subsumption. This operator is specified by the user with the help of mode declarations similar to those as used by Progol.

The system includes special features for lookahead to overcome myopia of search, for discretization, and many more, e. g. for dealing with large data sets.

The authors [14] further state that first-order decision trees show higher expressive power than flat normal logic programs as induced by many other ILP systems such as Tilde and Progol.

Summary

ILP systems show remarkable abilities, e. g. for learning recursive theories from few and complex examples. However, they tend to be inefficient for learning from larger sets of data as in real-life business databases. Further, high efforts may be necessary to run them appropriately, for instance for producing declaration files or for setting intricate parameters. We return to these issues in Chapter 4.

In document On propositionalization for knowledge discovery in relational databases (Page 38-40)