• No results found

Chapter 2: Literature Review

2.4 Overview of Inductive Logic Programming (ILP)

2.4.1 Inductive Logic Programming

Inductive logic programming (ILP) (Muggleton and De Raedt, 1994; Muggleton, 1999; De Raedt, 2008) is a subfield of machine learning that utilises logic programming and machine learning for inducing hypotheses (i.e. theories) from background knowledge B and examples E. The ILP theory describes the inductive inference (the inverse of deduction) of logic programs from background knowledge and examples. The examples are divided into a positive examples subset P and a negative examples subset N. In ILP, a hypothesis H which has to be derived by an ILP system should satisfy the following conditions (Muggleton and De Raedt, 1994; Muggleton, 1999):

 Completeness, such that H is considered complete if it covers all positive examples:

p P : H B p

where the logical symbol ∀ is a universal quantifier that denotes “for all”, p is a positive example, the logical symbol ∈ indicates membership in a set, the logical symbol ∪ denotes a union and the logical symbol ⊨ means an implication (entailment).

 Consistency, such that H is considered consistent if it does not cover any of the negative examples:

where n is a negative example and the logical symbol ⊭ indicates a negated implication.

More specifically, the derived hypothesis by ILP should entail all positive examples and none of the negative examples. It should also meet the language constraints.

The following brief introduction to logic programming is based on a description by De Raedt (2008). Logic programming is based on the use of the PROLOG logic programming language (Bratko, 1990), which is applied to describe relations between entities using clauses. In PROLOG, a term is a number, a constant, a variable, an atom or a compound term. The compound term consists of an atom (i.e. predicate) called a functor followed by a number of terms (i.e. arguments). The syntax of a compound term is: f(t1,…,tn),where f is the functor (function) symbol that is used to construct a

relation and t1,…,tn are some terms of the predicate f. For example, the compound

term called “specific(A,B)” is a predicate (relation) that consists of the functor “specific” and the two variables (terms) “A” and “B”. The number of arguments is called arity, e.g. specific/2, where the arity “/2” means that the predicate “specific” consists of two arguments.

Another key aspect in logic programming is the concept of horn clauses (Kowalski, 1974). A horn clause (i.e. a disjunction of literals) is a PROLOG clause which consists of logical predicates. The horn clauses are represented in the form of a rule (e.g. ‘h ← b1,…, bm’), where h is the head of the rule and b is the body of the rule. The symbol ‘,’ indicates a conjunction, while ‘’ indicates an implication. This

means that the head h is implied () by the body which consists of one atom (or one predicate) b or the conjunction of more than one atom (or more than one predicate) ‘b1, b2…bm’. The condition of the head predicate for a rule is satisfied if the conjunction of the body predicates is also satisfied.

ILP is different from other machine learning methods due to its expressive representation concept language, since it is able to learn information and theories from a set of relevant facts from any domain represented in a form that the ILP learner can understand. The most important characteristic of ILP is its ability to generate

meaningful and well-formed hypotheses, where this depends on the ability of providing declarative background knowledge to the inductive learner and getting the background knowledge right. In general, a set of positive and negative examples are provided as ground facts to the ILP learner together with background knowledge for generating theories. The background knowledge consists of horn clauses which represent information about the predicates that appear in the induced theory later. In particular, the background knowledge predicates (body predicates) describe the literals of the body for the constructed clauses. The target predicates (head predicates) should not appear in the body of the constructed clauses. A first order representation is defined as a set of constant symbols, predicate symbols and functor symbols. ILP can only learn from first-order horn clauses (Muggleton and De Raedt, 1994; Muggleton, 1999; De Raedt, 2008).

The process of an ILP learner is based on searching the space of possible hypotheses which satisfy some quality criteria. Particularly, the hypothesis should satisfy some syntactic (form of the constructed clause) and semantic (variables type) restrictions called the language bias. The language bias is used to limit the search space by reducing the number of potential solutions, preventing overfitting and learning well- formed hypotheses. The ILP learner begins by constructing a clause based on the provided examples and background knowledge. Then, it starts searching the space for the best hypotheses which cover more positive examples. To structure the search space, typical ILP systems utilise θ-subsumption as generalisation or specialisation operator (i.e. refinement operator) for partial ordering of clauses in order to determine which examples a clause covers (Muggleton and De Raedt, 1994; Muggleton, 1999; De Raedt, 2008). A clause C θ-subsumes a clause G if and only if there is a substitution θ, such that G, where ⊆ indicates a subset. The substitution θ is a function that turns a set of variables into a set of terms. For illustration, consider the following example of the clauses C and G:

C: mother(X, Y) ← parent(X, Y), female(X).

G: mother(ann, james) ← parent(ann, james), parent(ann, dave), female(ann),

male(james).

In the above example, all the literals of the clause C are included in the set of the literals of the clause G via the θ-subsumption. In particular, the θ-subsumption identifies the notion of generality, where C is called a generalisation of G and G is a specialisation of C under θ-subsumption (Plotkin, 1970).

In general, there are two main ILP strategies of information processing and knowledge ordering: top-down and bottom-up. The top-down strategy is based on searching the hypothesis space from general to specific, while the bottom-up strategy searches the hypothesis space from specific to general. It has been noticed that much of the research has applied the top-down approaches to induce useful hypotheses, since these approaches use short clauses in the search and this helps to reduce the size of the search space. In contrast, the bottom-up approaches start with long clauses and this increases the size of the search space as well as the cost of subsumption tests. In addition, the search performed by the bottom-up approaches may suffer from the problem of overfitting when using small datasets with large examples (Arias and Khardon, 2004; Flach, 1998). The common top-down ILP methods for learning rules and inducing hypotheses include ALEPH (Srinivasan, 1999), FOIL (Quinlan, 1990) and PROGOL (Muggleton, 1995). On the other hand, the system called GOLEM (Muggleton and Feng, 1992) represents one of the well-known ILP techniques that is based on the bottom-up search strategy. The following subsections give an overview of the top-down ILP methods ALEPH, PROGOL and FOIL.