The Inductive Logic Programming Framework

Inductive Logic Programming is placed at the intersection between Machine Learning and Logic Programming. From Logic Programming, ILP inherited the knowledge representation framework - namely, the use of first-order logic clauses to represent data. From Machine Learning, it inherited the learning mechanism, i.e. the derivation of some rules based on some positive and negative examples.

3.2.1 General Setting

The objective of ILP is to construct first-order logic clausal theories, called hypotheses, which are derived by reasoning upon a set of negative and positive examples, plus some additional

3.2 The Inductive Logic Programming Framework | 57 background knowledge about them. The process is typically carried out using a search in a space of possible hypotheses. More precisely, the task of ILP is defined as:

• Given E = E+_{[ E , a set E of training examples represented as ground facts, and divided}

into positive examples E+_{and negative examples E ;}

• Given B, some background knowledge about the examples e 2 E;

• Find a hypothesis H, so that H is complete and consistent with respect to the background knowledge B and the examples in E.

The sets E, B and H are logic programs, i.e. they are sets of clauses with an atom as head

and a set of atoms as body, in the form h b1, b2. . . bi. Also, the two sets of positive and

negative examples usually contain only ground clauses (clauses with empty bodies).

To check the completeness and consistency requirements, ILP uses a coverage function, which returns true if the example e is satisfied by H with respect to B. We note:

covers(H, B, e) = true iff H [ B ✏ e

meaning that e is a logical consequence of H [ B. Consequently, we say that:

• Completeness with respect to B is guaranteed when a hypothesis H covers all the positive

examples E+ _{✓ E, so that covers(B, H, e) = 8e 2 E}+_;

• Consistency with respect to B is guaranteed when a hypothesis H covers none of the negative examples, so that ¬covers(B, H, e) = 8e 2 E .

These criteria require then that H and E+ _{agree on the examples that H covers: in other}

words, a hypothesis acts as a classifier of examples that are tested against the oracle E+

(see Figure3.1). Criteria to check the validity of a classifier H are generally classification

accuracy, transparency, statistical significance and information content [Lavrac and Dzeroski,

1994].

Of course, not all the learning tasks can produce complete or consistent hypotheses: this means that ILP systems need to include a noise-handling mechanism that prevents overfitting by dealing with imperfect data such as noisy training examples or missing, sparse or inexact values in the background knowledge.

(a) H: complete, consistent. (b) H: complete, inconsistent.

Figure 3.1 Accuracy of a hypothesis based on the completeness and consistency criteria.

3.2.2 Generic Technique

In Inductive Logic Programming, as well as in Inductive Learning, induction (or generalisation) is seen as a search problem, where a hypothesis has to be found in a partially ordered

space of hypotheses [Mitchell,1982]. This process requires three steps:

(i) Structuring. In a first instance, an ILP algorithm constructs an ordered lattice of all the possible hypotheses, ordered from the most general (the ones that cover more training examples) to the most specific (the ones that cover one training example only). (ii) Searching. The ordered space is then searched using some refinement operators, which

are functions computing the generalisation or specification of a clause, according to whether the search is performed in a bottom-up or top-down way, respectively.

(iii) Bounding. Finally, in order to reduce the computational complexity, some bias (e.g. in heuristically directing the search or in the language expressing the hypotheses) is defined to constrain and reduce the search in the space.

3.2 The Inductive Logic Programming Framework | 59 The generic ILP algorithm works as follows: candidate hypotheses are kept in a queue; hypotheses are then repeatedly deleted from the queue and expanded using the refinement operators; finally, if they are valid according to the declared bias, the new expanded hypotheses are added to the queue. This process continues until a stop-criterion is satisfied.

3.2.3 A Practical Example

Reusing the example introduced earlier in this thesis, let us imagine that we want to automat- ically learn why during some dates (considered as the positive examples) people look for “A Song of Ice and Fire”, while on some other dates (the negative examples) people do not. We note the concept to be learnt as isPopular(X), with X being a date. Suppose we have some background knowledge about this problem, e.g. which TV series episodes have been

aired on those dates, as of Table3.1.

Table 3.1 Background knowledge B. GoT and HIMYM are respectively the “Game of Thrones” and “How I met Your Mother” TV series.

B happenedOn(‘2013-06-09’,‘GoT-s03e10’) happenedOn(‘2014-06-15’,‘GoT-s04e10’) happenedOn(‘2014-06-15’,‘HIMYM-s08e23’) happenedOn(‘2014-03-31’,‘HIMYM-s09e24’) TVseries(‘GoT-s03e10’,‘GoT’) TVseries(‘GoT-s04e10’,‘GoT’) TVseries(‘HIMYM-s08e23’,‘HIMYM’) TVseries(‘HIMYM-s09e24’,‘HIMYM’) seasonFinale(‘GoT-s03e10’) seasonFinale(‘GoT-s04e10’) seasonFinale(‘HIMYM-s08e23’) seasonFinale(‘HIMYM-s09e24’)

Suppose also that we are given some positive and negative examples, i.e. dates in which “A

Song of Ice and Fire” was popular or not, as in Table3.2.

Table 3.2 Examples E for isPopular(X).

E+ _E

isPopular(‘2013-06-09’) isPopular(‘2013-05-06’) isPopular(‘2014-06-15’) isPopular(‘2014-03-31’)

Believing B, we can induce that “A Song of Ice and Fire” is popular for those dates in which a season finale of the TV series “Game of Thrones" has been aired. Therefore:

isPopular(X) happenedOn(X,Y) ^ TVseries(Y,‘GoT’) ^ seasonFinale(Y)

Note that H is complete with respect to B, because all the examples in E+_{are season finales}

and are part of the TV series “Game of Thrones”; and it is also consistent, because none of the negative examples is satisfied by H.

In document Explaining Data Patterns using Knowledge from the Web of Data (Page 81-85)